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POLYKETIDF.fi AND THEIR SYNTHESIS 

The present invention relates to processes and 
materials (including enzyme systems^ nucleic acids ,r 
5 vectors and cultures) for preparing polyketides, 
particularly polyethers but including polyenes, 
macrolides and other polyketides by recombinant 
synthesis, and to the polyketides so produced, 
particularly novel polyketides- (N,B the term 

10 '^polyketide'' is being used in its conventional sense to 
include structures notionally derived by the reduction 
and/or other processing or modification of one or more 
Ketide units) . Furthermore the invention provides the 
entire nucleic acid sequence of the biosynthetic gene 

15 cluster that governs the production of the ionophoric 

antibiotic polyether polyketide monensin in Streptomyces 
cinnamonensls, and the use of all or part of the cloned 
DNA first, in the specific detection of other polyether 
biosynthetic gene clusters; secondly in the engineering 

20 of mutant strains of S, clnnamonensis and of other 

actinomycetes which are suitable host' strains -for the 
high level production of novel recombinant polyketides; 
and thirdly in the provision of recombinant biosynthetic 
genes which lead to such novel polyketide products. 

25 Polyketides are a large and structurally diverse 
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class of natural products that includes many compounds 
possessing antibiotic or other pharmacological 
properties, such as erythromycin, tetracyclines, 
rapamycin, avermectin, monensin, epothilones and FK506. 
5 In particular, polyketides are abundantly produced by 

Streptomyces and related actinomycete bacteria. They are 
synthesised by the repeated stepwise condensation of 
acylthioesters in a manner analogous to that of fatty 
acid biosynthesis. The greater structural diversity found 

10 among natural polyketides arises from the selection of 
(usually) acetate or propionate as '''starter" or 
'^extender" units; and from the differing degree of 
processing of the p-keto group observed after each 
condensation. Examples of processing steps include 

15 reduction to p-hydroxyacyl-, reduction followed by 

dehydration to 2-enoyl-, and complete reduction to the 
saturated acylthioester- The stereochemical outcome of 
these processing steps is also specified for each cycle 
of chain extension. In addition, the biosynthetic 

20 pathways to many polyketides involve additional enzyme- 
- catalysed '-modifications ^hioh may-*ino3>ude<: ^nethylation by 
O- and C-methyltransf erases, hydroxylation by cytochrome 
P450 enzymes, other oxidation or reduction processes, and 
the biosynthesis and attachment of novel sugars and/or 

25 deoxy sugars . 
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The biosynthesis of polyketides is initiated by a 
group of chain-forming enzymes known as polyketide 
synthases. Two classes of polyketide synthase (PKS) have 
been described in actinomycetes . One class, named Type I 
5 PKSs, represented by the PKSs for the macrolides 

erythromycin, oleandomycin, avermectin and rapamycin, 
consists of a different set or "^^module" of enzymes for 
each cycle of polyketide chain extension. {For examples 
see Cort6s, J. et al. Nature (1990) 348:176-178; Donadio, 

10 S, et al. Science (1991) 252:675-679; Swan, D.G. et al. 

Mol. Gen. Genet. (1994) 242:358-362; MacNeil, D-J. et al. 
Gene (1992) 115:119-125; Schwecke, T. et al. Proc. Natl. 
Acad. Sci. USA (1995) 92:7839-7843.) 

The term ^^extension module" as used herein refers to 

15 the set of contiguous domains, from a p-ketoacyl-ACP 

synthase (^^KS") domain to the next acyl carrier protein 
(^^ACP") domain, which accomplishes one cycle of 
polyketide chain extension. The term loading module" is 
used to refer to any group of contiguous domains which 

20 accomplishes the loading of the starter unit onto the PKS 

and thus renders i-t -avail-abie to "the KS domain of the 

first extension module. The length of polyketide formed 
has been altered, in the case of erythromycin 
biosynthesis, by specific relocation using genetic 

25 engineering of the enzymatic domain of the erythromycin- 
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producing PKS that contains the chain releasing 
thioesterase/cyclase activity (Cortes J. et al. Science 
(1995) 268:1487-1489; Kao, C-M. et al. J- Am. Chem. Soc. 
(1995) 117:9105-9106). 
5 In- frame deletion of the DNA encoding part of the 

ketoreductase domain in module 5 of the erythromycin- 
producing PKS (also known as 6-deoxyerythronolide B 
synthase^ DEBS) has been shown to lead to the formation 
of erythromycin analogues 5, 6-dideoxy-3-a-mycarosyl-5- 
10 oxoerythronolide B, 5, 6-dideoxy-5-oxoerythronolide B and 

5. 6- dideoxy, 6-|3-epoxy-5-oxoerythronolide B (Donadio, S. 
et ai. Science (1991) 252:675-679). Likewise, alteration 
of active site residues in the enoylreductase domain of 
module 4 in DEBS, by genetic engineering of the 

15 corresponding PKS-encoding DNA and its introduction into 
Saccharopolyspora erythraea, led to the production of 

6. 7- anhydroerythromycin C (Donadio, S. et al, Proc. Natl. 
Acad. Sci. USA (1993) 90:7119-7123). 

International Patent Application number WO 93/13663 
20 describes additional types of genetic manipulation of the 
DEBS genes that are capable - of 'producing altered 
polyketides. However many such attempts are reported to 
have been unproductive (Hutchinson, C.R. and Fujii, I. 
Annu- Rev- Microbiol. (1995) 49:201-238, at p. 231). The 
25 complete DNA sequence of the genes from Streptomyces 
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hygroscopicus that encode the modular Type I PKS 
governing the biosynthesis of the macrocyclic 
immunosuppressant polyketide rapamycin has been disclosed 
(Schwecke, T. et al. (1995) Proc. Natl, Acad. Sci, USA 
92:7839-7843). The DNA sequence is deposited in the 
EMBL/Genbank Database under the accession number X86780. 

WO 98/01546 discloses that a PKS gene assembly 
(particularly of Type I) encodes a loading module which 
is followed by at least one extension module. The first 
open reading frame encodes the first multi-enzyme or 
cassette (DEBSl) which consists of three modules: the 
loading module (ery-load) and two extension modules 
(modules 1 and 2) . The loading module comprises an 
acyltransf erase and an acyl carrier protein. This may be 
contrasted with Figure 1 of WO 93/13663 (referred to 
above) • This shows ORFl as only two modules, the first of 
which is in fact both the loading module and the first 
extension module. 

WO 98/01546 describes in general terms the 
production of a hybrid PKS gene assembly comprising a 
••loading' module -and- at --lea«st *one-.extensiGn -module. .It also 
describes (see also Marsden, A.F.A. et al* Science (1998) 
279:199-202) construction of a hybrid PKS gene assembly 
by grafting the wide-specificity loading module for the 
avermectin-producing polyketide synthase onto the first 
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multi-enzyme component (DEBSl) for the erythromycin PKS 
in place of the normal loading module. Certain novel 
poly ket ides can be prepared using the hybrid PKS gene 
assembly, as described for example in WO 98/01571. 
5 WO 98/01546 further describes the construction of a 

hybrid PKS gene assembly by grafting the loading module 
for the rapamycin-producing polyketide synthase onto the 
first multi-enzyme component (DEBSl) for the erythromycin 
PKS in place of the normal loading module. The loading 

10 module of the rapamycin PKS differs from the loading 
modules of DEBS and the avermectin PKS in that it 
comprises a CoA ligase domain, an enoylreductase (^'ER") 
domain and an ACP, so that suitable organic acids 
including the natural starter unit 3,4- 

15 dihydroxycyclohexane carboxylic acid may be activated in 
situ on the PKS loading domain and, with or without 
reduction by the ER domain, transferred to the ACP for 
intramolecular loading of the KS of extension module 1 
(Schwecke, T. et al. Proc. Natl. Acad. Sci. USA (1995) 

20 92:7839-7843). WO 98/51695 and WO 98/49315 describe 

-'additronai "types ^ of -genetic- -^manipulati'on of -the ^DEBS 
genes that are capable of producing altered polyketides. 

The second class of PKS, named Type II PKSs, is 
represented by the synthases for aromatic compounds. Type 

25 II PKSs contain only a single set of enzymatic activities 
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for chain extension and these are re-used as appropriate 
in successive cycles (Bibb, M,J. at al. EMBO J. (1989) 
8:2727-2736; Sherman, D.H, et al. EMBO J. (1989) 8:2717- 
2725; Fernandez-Moreno, M,A. et al. J- Biol. Chem. (1992) 
5 267:19278-19290). The '^extender" units for the Type II 
PKSs are usually acetate units, and the presence of 
specific cyclases dictates the preferred pathway for 
cyclisation of the completed chain into an aromatic 
product (Hutchinson, C.R. and Fujii, I. Ann. Rev. 

10 Microbiol. (1995) 49:201-238). Hybrid polyketides have 
been obtained by the introduction of cloned Type II PKS 
gene-containing DNA into another strain containing a 
different Type II PKS gene cluster, for example by 
introduction of DNA derived from the gene cluster for 

15 actinorhodin, a blue-pigmented polyketide from 
Streptomyces coelicolor, into an anthraquinone 
polyketide-producing strain of Streptomyces galileus 
(Bartel, P.L. et al. J. Bacterid. (1990) 172:4816-4826). 
The minimal number of domains required for 

20 polyketide chain extension on a Type II PKS when 

expressed in a Streptomyces" *coeiic*o*i"or'*'hx)si: cell (the 
''minimal PKS") has been defined for example in WO 
95/08548 as containing the following three polypeptides 
which are products of the acti genes: firstly KS; 

25 secondly a polypeptide termed the CLF with end-to-end 
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amino acid sequence similarity to the KS but in which the • 
essential active site residue of the KS, namely a 
cysteine residue, is substituted either by a glutamine 
residue or, in the case of the PKS for a spore pigment 
5 such as the whiE gene product (Davis, N.K. and Chater, 
K.F. Mol. Microbiol, (1990) 4:1679-1691) by a glutamic 
acid residue; and finally an ACP. The CLF has been stated 
(for example in WO 95/08548) to be a factor that 
determines the chain length of the polyketide chain that 

10 is produced by the minimal PKS- However it has been found 
(Shen, B. et al. J. Am. Chem. Soc. (1995) 117:6811-6821) 
that when the CLF for the octaketide actinorhodin is used 
to replace the CLF for the decaketide tetracenomycin in 
host cells of Streptomyces glaucescens, the polyketide 

15 product is not found to be altered from a decaketide to 

an octaketide, so the exact role of the CLF remains 
unclear. An alternative nomenclature has been proposed in 
which KS is designated KSa and CLF is designated KS3, to 
reflect this lack of knowledge (Meurer, G, et al. 

20 Chemistry & Biology (1997) 4:433-443). The mechanism by 
* -which -acetatre -starter-unrts ^-and -acetatB -extender units 
are loaded onto the Type II PKS is not known, but it is 
speculated that the malonyl-CoA: ACP acyltransf erase of 
the fatty acid synthase of the host cell can fulfil the 

25 same function for the Type II PKS (Revill, W,P. et al. J. 
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Bacteriol. (1995) 177:3946-3952). 

WO 95/08548 describes the replacement of 
actinorhodin PKS genes by heterologous DNA from other 
Type II PKS gene clusters, to obtain hybrid polyketides . 
5 It also describes the construction of a strain of 

Stxeptomyces coelicolor which substantially lacks the 
native gene cluster for actinorhodin, and the use in that 
strain of a plasmid vector pRM5 derived from the low-copy 
number vector SCP2* isolated from Streptomyces coelicolor 

10 (Bibb, M,J. and Hopwood, D.A. J. Gen. Microbiol. (1981) 

126:427-442) and in which heterologous PKS-encoding DNA 
may be expressed under the control of the divergent acti/ 
actlll promoter region of the actinorhodin gene cluster 
(Fernandez-Moreno, M.A. et al. J. Biol. Chem. (1992) 

15 267:19278-19290). The plasmid pRM5 also contains DNA from 

the actinorhodin biosynthetic gene cluster encoding the 
gene for a specific activator protein, ActII-orf4. The 
ActII-orf4 protein is required for transcription of the 
genes placed under the control of the actl/actlll 

20 bidirectional promoter and activates gene expression 

during the transition from growth to stationary pha*sB in 
the vegetative mycelium (Hallam, S-E. et al. Gene (1988) 
74:305-320) . 

Type II clusters in Streptomyces are known to be 
25 activated by pathway-specific activator genes (Narva, 
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K,E. and Feitelson, J.S. J. Bacteriol. (1990) 172:326- 
333; Stutzman-Engwall, K,J, et al. J. Bacteriol, (1992) 
174:144-154; Fernandez-Moreno, M.A. et al. Cell (1991) 
66:769-780; Takano, E. et al. Mol. Microbiol- (1992) 
5 6:2797-2804; Gramajo, H,C. et al. Mol. Microbiol. (1993) 

7:837-845). The DnrI gene product complements a mutation 
in the actII-orf4 gene of S. coelicolor, implying that 
DnrI and ActII-orf4 proteins act on similar targets. A 
gene [srmR) has been described (EP 0 524 832 A2) that is 
10 located near the Type I PKS gene cluster for the 

I 

macrolide polyketide spiramycin. This gene specifically 
activates the production of the macrolide antibiotic 
spiramycin, but no other examples have been found of such 
a gene. Also, no homologues of the Actll-orf 4/DnrI/RedD 

15 family of activators have been described that act on Type 
I PKS genes. WO 98/01546 describes the use of the Actll- 
orf4 family of activators in conjunction with their 
cognate promoters (e.g actII-orf4 with the acti promoter) 
in a heterologous actinomycete to obtain high level 

20 expression of recombinant Type I polyketide synthase 
genes. « 

Although large numbers of therapeutically important 
polyketides have been identified, there remains a need to 
obtain novel polyketides that have enhanced properties or 
25 possess completely novel bioactivity. The complex 
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polyketides produced by Type I PKSs are particularly 
valuable^ in that they include compounds with known 
utility as anthelminthics, insecticides, 
iiranunosuppressants, antifungal agents or antibacterial 
5 agents- Because of their structural complexity, such 
novel polyketides are not readily obtainable by total 
chemical synthesis, nor by chemical modifications of 
known polyketides- 

There is also a need to develop reliable and 

10 specific ways of deploying individual genes and portions 
of genes in practice so that all, or a large fraction, of 
hybrid PKS genes that are constructed, are viable and 
produce the desired polyketide product. This includes the 
development of advantageous host strains for expression 

15 . of such genes. For example many polyketides are rendered 
bioactive by the action of further enzymes other than the 
polyketide synthase, and host strains that contain and 
are able to express the genes for such enzymes are 
particularly convenient for the efficient synthesis of 

20 the bioactive material. In those cases where the 

construction of -a«know.n..or -a :noveL,?p.olyketide requires 
specialised precursors, host strains containing and able 
to express the genes for key enzymes that enhance the 
production of such specialised precursors are equally 

25 valuable and desirable. There is also a need to develop 
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rational methods of increasing the expression level of 
all the genes required for production of a specific 
polyketide. Clearly also a host cell which is 
advantageous for the above reasons, and/or because of 
5 other favourable characteristics including but not 
limited to its speed of growth, excellent handling 
characteristics in fermentation, and ease of 
transformation with DNA by various techniques, can be 
made even more favourable by the cloning into that cell 
10 of such auxiliary genes for polyketide modification, or 

gene activation, or post-translational modification, or 
precursor supply. 



The DNA sequences have been disclosed for several 
15 Type I PKS gene clusters that govern the production of 

16-meinbered macrolide polyketides, including the tylosin 
PKS from Streptomyces fradiae (application EP 0 791 655 
A2), the niddamycin PKS from Streptomyces caelestis 
(Kavakas, S.J- at al. J. Bacteriol. (1997) 179:7515-7522) 
20 and the spiramycin PKS from Streptomyces ambofaciens 
(appl'icatlon ' EP "0791 '65*5 A2-) . -DNA sequences -have also 
been disclosed for Type I PKS gene clusters that govern 
the production of further complex polyketides, for 
example rifamycin from Amycolatopsis mediterranei (WO 
25 98/07868), and soraphen from Sorangiuin cellulosum (US 
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5716849) ^ but so far no DNA sequence has been disclosed 
for one of the most widespread and important classes of 
complex polyketides, the polyethers. 

Polyethers form an important group of complex 
5 polyketide antibiotics (West ley, J-W, in ''Antibiotics IV. 
Biosynthesis" (Corcoran, J.W. Ed.), Springer-Verlag, New 
York (1981) p. 41-73) . They are polyoxygenated carboxylic 
acids which act as selective ionophores transporting 
cations across the cell membrane of target cells and 

10 thereby causing depolarisation and cell death. Certain 
polyethers including monensin, lasalocid and tetronasin 
are in widespread use in animal husbandry as 
coccidiostats (principally targetted against jSimeria 
spp.) and as growth promoters. Polyethers have also been 

15 reported to be active in vitro and in vivo against the 
malarial parasite Plasmodium falciparum (Gumila, C. et 
al. Antimicrobial Agents and Chemotherapy (1997) 41: 523- 
529) . 

Polyethers contain multiple asymmetric centres and 
20 are characterised by the presence of tetrahydrofuran and 
■ - tetrahydropyran' ring^s, -prodqcing a "chBTirarct eristic shape 
which is non-polar on its outer surface and therefore 
well adapted for transport of material across bacterial 
membranes; and provides on its inner surface polar 
25 coordinating ligands for a centrally-bound metal ion. In 
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addition to tetrahydrofuran and tetrahydropyran rings, 
other groups which are often present include spiroketal, 
dispiroketal, and substituted benzoic acid moieties and 
occasionally other groups for example a tetronic acid or 
5 a 6-membered carbocyclic ring 

Monensins A and B are produced by the actinomycete 
Streptomyces cinnamonensis. Their structures are shown in 
Figure 1. Monensin B differs from monensin A only in the 
presence of a methyl sidechain at C-16 rather than an 

10 ethyl sidechain. Monensin selectively binds and 

transports sodium ions. In addition to its antibacterial 
and antifungal properties monensin has some activity 
against protozoal parasites such as the malarial parasite 
Plasmodium falciparum. Although the structures of 

15 polyethers differ significantly from those of other 
complex polyketides such as the polyhydroxylated and 
polyene macrolides, their biosynthesis appears to take 
place by a metabolic pathway which has many common 
elements. Thus experiments using carbon 14-labelled 

20 precursors have shown that monensin A is synthesised from 
^±ve acetate,' one butyr-ate -and "Seven -propionate units 
(Day^ L.E. et al. Ant imicrob. Agents Chemother. (1973) 
4:410-414). Similarly experiments using precursors 
doubly-labelled with carbon-13 and oxygen-18 have shown 

25 that oxygens (0)1, (0)3, (0)4, (0)5, (0)6 and (0)10 of 
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monensin arise from the carboxylate oxygens of either 
propionate or acetate, while growth in the presence of 
oxygen- 18 oxygen gas demonstrated that the three 
remaining ether oxygens (0)7, (0)8 and (0)9 are derived 
5 from molecular oxygen (Cane, D.E. et al., J. Mi. Chem. 

Soc. (1981) 103:5962-5965; Cane, D.E, et al. J. Am. Chem. 
Soc. (1982) 104:7274 - 7281; Ajaz, A. A. and Robinson, 
J. A. J, Chem. Soc. Chem. Commun. (1983) 12:679-680). 
These findings have been rationalised by proposing that 

10 the biosynthesis of monensin proceeds via an acyclic 
triene intermediate (1) in which the geometry of all 
three carbon-carbon double bonds is E (entgegen) rather 
than Z (zusammen) . The triene is then proposed to be 
subject to epoxidation to a tri-epoxide (2) and then ring 

15 opening is proposed to occur with concomitant sequential 
formation of the five ether rings as shown in Figure 2A. 
Such a biosynthetic pathway, first mooted by Westley in 
1974 (Westley J.W. et al., J. Antibiot. (1974) 27:597- 
604) accounts for the observed stereochemistry at the 

20 multiple asymmetric centres in monensin, (Cane, D.E. et 

al. J. Am. Chem. Soc. '(T982) 104:7274-7281; Sood, G.R. et 
al. J. Chem. Soc. Chem. Commun. (1984) 21:1421-1424) and 
analogous schemes can be used to account for the 
biosynthesis of other known polyethers. such as lasalocid 

25 A (Hutchinson C.R. et ai., J. Am. Chem. Soc. (1981) 
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103:5953-5956) r tetronasin (ICI 139603) (Demetriadou^ 
A.K. et al. J. Chem. Soc. Chem. Commun. (1985) 7:408-410) 
and narasin (Spavold, Z. et al. Tetrahedron Letters 
(1986) 27:3299-3302). The hydroxylation at C-26 and the 
5 introduction of an O-methyl group on oxygen 3 -are 

proposed to occur as late steps in the biosynthesis, 
after formation of the polyether structure. 

Unfortunately key aspects of the biosynthetic scheme 
shown in Figure 2A have so far eluded experimental 

10 confirmation. No biosynthetic intermediates have been 

isolated from mutants of S. cinnamonensis that are 
blocked in early stages of monensin production. 26- 
deoxymonensin A has been isolated from a S. cinnamonensis 
mutant partially blocked in monensin production 

15 (Ashworth, D.M. et al. J. Antibiot. (1989) 42:1088-1099) 

and 3-0-demethylmonensins A and B have been recovered as 
minor components from the fermentation broth of a 
monensin-producing strain (Pospisil, S. et ai. J. 
Antibiot. (1987) 40:555-557). When fed to cells of S. 

20 cinnamonensis in radio-labelled form, neither 

26-deoxymonensin A, nor 3-0-demethylmonensin A, nor 3-0- 
demethyl, 26-deoxymonensin A were significantly 
incorporated into monensin A (Ashworth, D.M. et ai. J. 
Antibiot. (1989) 42:1088-1099), either because they are 

25 actively excluded or because these modifications in fact 
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occur earlier in the biosynthetic pathway so that these 
metabolites are shunt products not readily converted into 
the final antibiotic by the respective hydroxylase or 
methyltransf erase. Similarlyr the putative all (E)-triene 

5 precursor (1) has been synthesised and shown not to 

become incorporated into monensin when fed to growing 
cells of S. cinnamonensis (Holmes, D.S. et al. Helv. 
Chim. Acta (1990) 73:239-259). An alternative pathway has 
been proposed, as shown in Fig 2B, based on the 

10 transition-metal-mediated oxidation of 1,5-dienes (Walba, 
D.M. and Edwards, P.D. Tetrahedron Lett. (1980) 21:3531- 
3534). The triene intermediate (4) would different from 
that of Figure 2A (1) only in that each carbon-carbon 
double bond would have the (Z) -configuration (Townsend, 

15 C.A. and Basak, A. Tetrahedron (1991) 47:2591-2602) and 
not the (E) - configuration. 

The genetic basis of secondary metabolite 
biosynthesis essentially exists in the genes which code 
for the individual biosynthetic enzymes and in the 

20 regulatory elements which control the expression of the 
*biosynthei:ic genes. -Th'e^^genes encoding-biosynthesis of 
polyketides in actinomycetes have hitherto been found as 
clusters of adjacent genes, ranging in size from 
20 kilobasepairs (kbp) to over 100 kbp. The clusters 

25 often contain specific regulatory genes and genes 
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conferring resistance of the producing strain to its own 
antibiotic. 

In various of its aspects the invention provides the 
following: - 

5 (1) a DNA sequence encoding at least one -peptide 

necessary for the biosynthesis of monensin, preferably 
comprising one or more of the following genes: mon BIf 
mon Bllf mon CI^ mon CII^ mon H, jnon Rlr mon RII^ mon T, 
mon AIX and mon AX as depicted in the appended sequence 
10 data or an allele or mutation thereof; 

(2) a DNA sequence according to the first aspect 
comprising all of the genes listed therein or an allele 
or mutation thereof; 

(3) a DNA sequence according to the first aspect 
15 comprising the complete monensin gene cluster; 

(4) a DNA sequence coding for one or more of the 
peptides set out below, said peptide having the amino 
acid sequence as set out in the appended sequence data or 
being a variant thereof having the specified activity: 

20 peptide activity 

•mon CII -epoxyhydrolase/cyclase 

jnon E S-adenosylmethionine-dependent methyltransf erase 

mon T monensin resistance gene 

jnon RII repressor protein 
25 mon AIX thioesterase 
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15 

(5) a recombinant cloning or expression vector 
comprising a DNA sequence according to any of aspects 1-4; 

(6) a transformant host cell which has been 
transformed to contain a DNA sequence according to- any of 

20 aspects 1-4 and is capable of expressing a corresponding 

peptide ; 

(7) a hybridization probe comprising a polynucleotide 
which binds specifically to a region of the monensin gene 
cluster selected from mon BIf mon BII, mon CI, mon Cllr 

25 mon Hr mon RI, mon RII, mon T, mon AIX and inon AX; 
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(8) use of a probe according to aspect (7) in a 
method of detecting the presence of a gene cluster which 
governs the synthesis of a polyether, and optionally 
isolating a gene cluster detected thereby; 

5 (9) Use of a probe comprising a polynucleotide which 

binds specifically to a gene responsible for levels of 
activity of the monensin gene cluster, preferably a 
regulatory gene, resistance gene or thioesterase gene, 
more preferably the regulatory gene mon RI, in a method of 

10 detecting an analogous gene in a gene cluster of another 
polyketide, preferably a polyether, and optionally 
manipulating the gene detected thereby to alter the level 
of expression of said other polyketide; 

(10) a host cell, preferably Streptomyces 

15 cinnamonensis, containing a heterologous gene under the 

control of the mon RI gene and a monensin promoter; 

(11) use of a portion of the monensin gene cluster 
having chain terminating activity, preferably comprising 
at least one of mon AIX and mon AX or a mutant or allele 

20 thereof having chain terminating activity, to effect chain 
■ ' ""rel^a'se of a "peptide "other -than one*' -required for -monensin 
biosynthesis; 

(12) use of a portion of the monensin gene cluster 
having carbon-carbon double bond isomerase activity, 

25 preferably comprising at least one of mon BI and mon BII 
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or a mutant or allele thereof having isomerase activity to 
provide a desired stereochemical outcome in the synthesis 
of a polyketide other than monensin; 

(13) a polypeptide encoded by a portion of the 

5 monensin gene cluster, preferably comprising at least one 
of mon BI and mon BII or a mutant or allele thereof, 
having carbon-carbon double bond isomerase activity; 

(14) an epoxidase enzyme encoded by jnon CI or a 
derivative or variant thereof having epoxidase activity; 

10 (15) a cyclase enzyme encoded by mon CII or a 

derivative or variant thereof having cyclase activity. 

Some embodiments of the invention will now be 
described by way of example with reference to the 
accompanying drawings in which: 
15 Fig 1 shows the structure of monensins A and B; 

Fig 2 illustrates proposed biosynthetic pathways; 
Fig 3 illustrates the proposed organization of the 
monensin polyketide synthase (PKS) enzyme complex; and 
Fig 4 illustrates the proposed organization of the 
20 monensin biosynthetic gene cluster. 

' "The ' overall -gene -^organization ^of the ^monensin 
biosynthetic gene cluster, as shown in Fig 4, is similar 
to that previously found for many macrolide biosynthetic 
gene clusters, which have one or more open reading frames 
25 (ORFs) encoding large multifunctional PKSs flanked by 
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other genes which encode functions required for the 
biosynthesis of the antibiotic. In the case of monensin, 
there is an unusually high number of distinct ORFs 
encoding PKS multi-enzymes (eight in total, labelled monAI 
5 to monAVIII) but there is again a separate module of 
enzymes for each cycle of polyketide chain extension, 
exactly as found for modular PKSs for macrolide 
biosynthesis (see Fig 3) • Thus there are 12 condensations 
predicted to be required for the production of the carbon 

10 skeleton of monensin, and in agreement with this there are 
found to be 12 extension modules of PKS enzymes 
distributed among the 8 PKS ORFs, However, as mentioned in 
detail below, the other genes in the monensin cluster 
include genes which have not previously been found in any 

15 other gene cluster for the biosynthesis of a complex 

polyketide, and which are not significantly similar to any 
genes in published sequence databases. The cloned DNA for 
these genes is useful to allow the diagnosis that a 
polyketide biosynthetic gene cluster in any actinomycete, 

20 uncovered previously by conventional hybridization against 
■ a PKS -gene^ probe f rom fsny) the DEBS -or some -other 
characterised PKS gene cluster, is one that governs the 
synthesis of a polyether; and these genes are also 
valuable either singly or in combination as specific 

25 hybridization probes for the specific detection and 
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isolation of additional polyether biosynthetic gene 
clusters. Examples of these previously-unknown genes are 
the genes laonBI, monBII, monCI and monCII. In addition the 
regulatory genes monH monRIr and monRII and the resistance 
gene monT and the thioesterase genes monAIX and monAX are 
all useful for the detection of analogous genes in other 
polyether clusters which are required for the rational 
manipulation of such genes in order to increase levels of 
the specific product • 

The cloned and sequenced cluster of genes for 
monensin biosynthesis is useful secondly in the 
engineering of mutant strains of S. cinnamonensis and of 
other actinomycetes which are suitable strains for the 
high level production of either natural or novel 
recombinant polyketides. The sequence of the monensin 
cluster disclosed here shows the surprising fact, that the 
gene cluster contains a gene monRI whose gene product has 
an amino acid sequence highly similar to that of actll- 
orf4, the pathway-specific activator gene which activates 
the acti and other promoters of the actinorhodin 
biosynthetic gene cluster of Streptomyces coelicolor. The 
recognition of this aspect of the natural regulation of a 
Type I PKS cluster is important and valuable because 
first/ it is possible to increase the yield of monensin by 
increasing the level of the activator MonRI, either by 
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placing the gene monRI under the control of a powerful 
promoter or arranging for the presence within the cells of 
one or more additional copies of the monRI gene (as 
exemplified below) ; secondly, it will be possible to use 
5 the monRI gene as a specific hybridisation probe to locate 
similar genes in other complex PKS gene clusters, 
especially other polyether PKS gene clusters but also 
polyene and macrolide gene clusters and all other Type I 
modular PKS gene clusters; even in cases where {as for 

10 rapamycin and erythromycin) no such gene has been 

previously found within the currently accepted physical 
limits of the relevant biosynthetic gene cluster. In such 
cases the monRI gene probe might be expected to uncover 
the activator even if it resides on the chromosome at some 

15 distance from the main body of the gene cluster; and 
simple experiments would then show whether the 
activator (s) so uncovered are involved in regulation of 
the biosynthesis of those particular metabolites; thirdly, 
increasing the copy number of the monRI gene or of any of 

20 the activator genes uncovered will tend to increase the 
•yield of a -heterologous polyketide by ''crosstalk" where 
the activator mimics the presence of the normal activator 
for the transcription of the genes for that heterologous 
polyketide synthase. It is clear from recently published 

25 work (Wietzorrek, A. and Bibb, M. Mol, Microbiol. (1997) 
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25:1181-1184) that the ActII-orf4 family of activators 
exert their effects by binding to promoter regions within 
the target gene cluster, so it will be possible to use the 
monRI gene together with monensin promoter regions to 
drive the high-level transcription and transfation of 
heterologous genes in Streptomyces cinnamonensis, and 
perhaps in other host strains too; such genes need not be 
PKS genes or even involved in polyketide biosynthesis. 
Monensin promoter regions are found at the 5' end of genes 
or groups of genes in the cluster and their location is 
clear from the sequence analysis disclosed here. Thus a 
useful vector would provide the monensin promoter and the 
ribosome binding site and continue up to the start of the 
open reading frame, after which the monensin ORF naturally 
found there would be replaced by the heterologous gene. 
The relative strength of the monensin promoters can be 
readily determined using any one of a number of known 
promoter probes, i.e. genes whose expression gives rise to 
readily measurable and quantifiable effects, such as Green 
Fluorescent Protein (GFP) ; or beta-galactosidase in the 
. pr-eeence of a- chromogenicsubstratev ^t should ^be possible 
to mutate randomly the small region of the monensin 
promoters especially likely to interact with the MonRI 
activator (identified by the presence of tandem 
heptanucleotide repeats with a common consensus sequence 
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between the various monensin promoters) (Wietzorrek^ A. 
and Bibb, M. Mol. Microbiol. (1997) 25:1181-1184), and to 
determine the optimal DNA sequence for the maximal 
activation effect using either 5. cinnamonensis 
(preferably - in case there are other unknown" factors that 
make the activation function better in this strain than in 
other heterologous systems) , or even in another host 
actinomycete strain. If the natural monensin promoters 
were mutated to have this optimal recognition sequence, 
then this would further increase the production of 
monensin. By extension, the use of this modified monensin 
promoter in conjunction with the monRI gene in 
heterologous systems could form the basis of further 
improvements in expression of polyketide synthases or 
other genes, either by appropriate chromosomal alterations 
to introduce the altered promoter and also the monRI gene; 
or by provision of vectors containing these optimised 
signals linked to specific genes and housed in suitable 
host cells. 

The sequencing of the monensin cluster has uncovered 
-another strategy for gene regulation in such Type I 
clusters. The previously-sequenced genes for the rapamycin 
biosynthetic pathway in Streptomyces hygroscopicus 

m 

included a gene of unknown function {rapH) . A closely 
similar gene has now been found in the monensin 



- 26 



wo 01/68867 



PCT/GBOO/02072 



biosynthetic gene cluster {monH) , and it is clear from 
this recurrence (and the comparison of the sequences with 
those of database proteins) that this gene is potentially 
an important DNA-binding sensor gene which acts to 
5 regulate the transcription of the cluster in concert with 
other regulatory signals. Simple experimentation is needed 
in order to define whether the gene is an activator, in 
which case putting in another copy or increasing its 
transcription will have the potential to increase 

10 polyketide biosynthesis; or alternatively the rapH gene 

product may be a negative regulator, whereupon deletion of 
this gene may release the biosynthetic pathway from this 
inhibitory effect and increase yields. 

There is a continuing need to develop new methods of 

15 high-level production of bioactive metabolites and other 

m 

valuable gene products in actinomycetes . Streptomyces 
cinnamonensia is a recognised and very valuable industrial 
strain for the production of very high levels of monensin, 

« 

it is readily transformable with DNA by standard methods 
20 of conjugation or of protoplast transformation, it is a 
host for numerous known broad range" plasmids including 
well-known expression plasmids of both high- and low-copy 
number, it also grows quickly relative to other 
actinomycete strains (for example about three times faster 
25 than wild type Saccharopolyspora erythraea the 
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erythromycin producer, under comparable conditions) and 
sporulates relatively easily. Heterologous polyketides can 
be expressed in Streptomyces cinnamonensis using for 
example the low-copy number plasmid pCJR24 (which has no 
5 origin of replication active in actinomycetes" so is 

maintained by integration into the chromosome) {Rowe, C. 
et ai. Gene (1998) 216:215-223) or the related plasmid 
pCJR29 in which the polyketide synthase gene(s) are placed 
under the control of the actJ promoter which is activated 

10 by the ActII-orf4 activator; or alternatively the monAI 
promoter can be substituted together with the MonRI 
activator; or some other pairing of activator and cognate 
promoter chosen from either a Type II or a Type I 
polyketide synthase gene cluster. As an example, the wild 

15 type strain of Streptomyces cinnamonensis has been used to 
express the plasmid pCJR29 (Rowe, C. et al. Gene (1998) 
216:215-223) containing as insert the three ORFs for the 
PKS governing the production of 6-deoxyerythronolide B, 
the macrolide precursor of erythromycin A in 

20 Saccharopolyspora erythraea, these genes being placed 

under the control of the pathway-specific acti promoter 
from Streptomyces coelicolor together with its cognate 
activator gene actII-orf4. The transfomed strain when 
cultivated in a suitable liquid medium produced 6- 

25 deoxyerythronolide B in good yield - 
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It is well known to the person skilled in the art 
that it is possible to use standard vectors unable to 
replicate in actinomycetes to introduce DNA into a 
Streptomyces cell, such DNA comprising two portions of 
5 contiguous DNA which are each identical to one of two 

portions of the cell's chromosome that are spaced up to 
100 kbp apart; and that through recombination between the 
incoming DNA and the chromosome occurring in both portions 
of DNA the net result is that the chromosomal sequence is 

10 replaced by the defective sequence originally that of the 
incoming DNA. Such a procedure has been applied to the 
monensin-producing strain of S. cinnamonensls as described 
in detail below, and a strain of S. cxnnamonensis has been 
obtained that carries a specific deletion in the monensin 

15 cluster and which is unable to produce the antibiotic. The 
use of such a strain facilitates the production of 
heterologous polyketides by removal of the background of 
monensin production. 

The multiple uses of portions of the cloned and 

20 sequenced DNA from the monensin cluster will readily occur 
^■=to the pexson skilied*^:n*-the 3rt;' - A'-^^ ieature of 

the PKS of the monensin cluster is an unusual mechanism of 
polyketide chain initiation- We have found that the 
monensin PKS loading module has three domains, which from 

25 the amino-terminus of the protein are: a KSq domain, an 
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acyltransf erase domain and an ACP domain. We have 
uncovered this organisation in the PKS for the 14-membered 
macrolide oleandomycin as well as in the monensin PKS, an 
organisation of the loading module previously only found 
5 for the 16-membered macrolides and in which the KSq domain 
(which looks like a ketosynthase or condensation domain 
except that the active site cysteine residue is 
substituted by a glutamine for which the single letter 
notation is Q) had been previously speculated to have no 

10 function. It was realised that the acyltransf erase of the 
loading module actually has malonyl-CoA and not acetyl-CoA 
as a substrate and that KSq is an active decarboxylase. It 
appears that a better discrimination can be achieved in 
the selection of the smaller acetate unit over propionate 

15 if the choice is made initially between methylmalonyl- and 
malonyl-CoA, 

An unprecedented feature of the monensin PKS genes is 
that no integral chain-terminating domain is present as a 
C-terminal appendage of the PKS extension module that 
20 catalyzes the twelfth and final chain extension. Because 
- "'the product - of the^ 'monensln' ^PKS is— a'x:Bxboxyl±c acid, it 
would have been firmly predicted that chain release would 
have been catalyzed by such a C-terminal domain containing 
a ^^thioesterase" activity. Previously sequenced PKS gene 
25 sets have been of two sorts: first, those macrolide PKSs 
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typified by erythromycin, spiramycin, tylosin, niddamycin 
which have a readily recognisable C-terminal 
^^thioesterase'' domain, which in these enzymes functions as 
a specific cyclase rather than releasing the polyketide 
5 product as a free carboxylic acid; secondly, T:hose 
macrolide PKSs typified by rapamycin, FK506, and 
rifamycin, where there is an alternative and recognised 
mode of chain termination by transfer of the polyketide 
chain to an acceptor moiety, catalyzed by a specific 

10 enzyme (eg pipecolate incorporating enzyme for rapamycin 
(Schwecke T, et al. Froc. Natl- Acad. Sci. USA (1995) 
92:7839-7843) and FK506 (Mothamedi H. and Shafiee A, Eur. 
J. Biochemistry (1998) 256:528-534); arylamine synthetase 
for rifamycin (August P.R. et al. Chemistry & Biology 

15 (1998) 5:69-79). 

The monensin PKS surprisingly falls into neither 
category, and therefore seems to be the first example of a 
novel mode of chain termination. It is novel and 
noteworthy in this connection that the monensin PKS gene 

20 cluster contains two small genes that encode discrete, 
monofunctronal thioest erase enzyifid's. Although many PKS 
gene clusters have been previously shown to contain one 
such discrete thioesterase, none have been shown to have 
two. The role of such thioesterases is not known, although 

25 in the case of methymycin/pikromycin PKS, which has been 
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reported to be responsible for the biosynthesis of both 
the 12-membered macrolide laethymycin and the 14-meiabered 
macrolide pikromycin (Xue Y.Q. Proc. Natl. Acad. Sci. USA 
(1998) 95:12111-12116) the disruption of this thioesterase 
5 reportedly caused a ten-fold drop in the amount of both 
macrolides produced. A similar finding has been reported 
for the discrete thioesterase of the tylosin PKS gene 
cluster (Cundliffe E. et al. Chemistry & Biology in 
press) . Additional copies of such thioesterases may 

10 therefore accelerate the production of specific 

polyketide, but this has not yet been demonstrated. 
However, the presence of the discrete thioesterase is not 
completely essential for polyketide production. 

It is highly desirable to have a broadly effective 

15 method of catalysing the release of polyketide gene 

products from a PKS as the free acid. The well-studied 
integral thioesterase domain in the erythromycin PKS 
thioesterase has a broad specificity in cyclization to 
form a lactone (assxoming that a hydroxy group is present 

20 in the growing polyketide chain at an appropriate 

-position) , but 'hydrolysi's "to 'form ^ the "free acid is very 
slow. The recognition of the unusual arrangement of the 
monensin PKS means that it is now possible to harness 
either the entire PKS module that catalyses the twelfth 

25 and final extension cycle in monensin biosynthesis, or the 
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C-terminal portion of it, and graft it onto a different 
polyketide synthase by genetic engineering, so as to allow 
the release mechanism characteristic of monensin to 
operate in a different context. The use of this portion 
5 only of the monensin PKS suffices to allow the novel 

mechanism of chain release to operate successfully. The 
speed of the polyketide chain hydrolysis in a given case 
can depend on the additional presence of one or both of 
the discrete thioesterase genes {monAIX and monAX) from 

10 the monensin gene cluster- The use of this novel method of 
chain termination represents a valuable way of generating 
a large number of novel engineered polyketides that are 
currently inaccessible, and ensuring that the products 
have a specified chain length. 

15 The genes laonBI and monBII appear to encode very 

similar enzymes with significant amino acid sequence 
similarity to authentic ketosteroid isomerases which are 
known to catalyse the migration of an activated carbon- 
carbon double bond. The conservation of active site 

20 residues makes it very likely that these mon genes govern 
a reaction involving - activated double .JDonds dn.^he 
biosynthetic pathway to monensin and this surprising 
observation can be accommodated if the initial product of 
the polyketide chain growth on the monensin PKS is a 

25 linear precursor in which the double bonds were initially 
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formed with a conventional trans or E (entgegen) geometry; 
but before the polyketide chain was extended by insertion 
of the next unit the monBI and/or the monBII gene 
product (s) catalyse the specific rearrangement of the 
5 newly-created double bond into the cis or Z (zusammen) 
geometry. This new view of the monensin biosynthetic 
pathway allows the deduction that the monBI and monBII 
genes, perhaps in combination with specific portions of 
the monensin modules where they normally exert their 

10 effects (namely modules 3, 5 and 7) might be used in order 
to achieve the extremely desirable targetted biosynthesis 
of novel polyketides containing double bonds with Z 
geometry at specified point (s) along the chain. Thus for 
example it should be possible to provide for the direct 

15 biosynthesis of C22-C23 cis or Z double bond in 

avermectlns, thus avoiding tedious and expensive chemical 
conversion of an initial fermentation product into this 
important anthelminthic . Only limited experimentation is 
needed to see whether the monBI and/or monBII gene 

20 products are sufficient or whether the mon PKS at modules 
3, 5 and 7 forms part of the specific docking site(s) for 
the isomerases and therefore must also be used in the 
creation of the hybrid PKS that will insert the cis or Z 
double bond at the desired position. The substrate 

25 specificity of the isomerases need not be limited to 2,3-- 
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unsaturated thioesters. The purified enzymes could also be 
used to effect such isomerisations in vitro ^ depending on 
the position of the equilibrium or whether further enzymes 
are used to achieve the further transformation of the 
5 product as it is formed {vide infra) , 

The product of the monCI gene is a novel oxidative 
enzyme with some sequence similarity to authentic examples 
of such enzymes in the databases; and with a clearly 
definable role in the monensin biosynthetic pathway, the 

10 epoxidation of the double bonds at three separate 

positions in the initially-formed acyclic intermediate in 
monensin biosynthesis. This epoxidase could therefore be 
used in conjunction with monBI/monBII gene products to 
effect oxidative reactions on suitable substrates in vitro 

15 and in vivo. Similarly the monCII gene product is a 

putative cyclase that opens the epoxides and causes the 
formation of ether rings in monensin. 

Any or all of the jnonBI, monBIIf monCI or monCII 
genes may be introduced into a heterologous strain 

20 containing the gene cluster for another polyether, in 
order to divert the biosynthetic pathway and produce a 
polyketide of altered structure. In these experiments the 
analogues of these monB genes could either be present or 
(once located and characterised using the inon genes as 

25 probes) they may be deleted prior to the introduction of 
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the monB and monC genes into that strain. The converse 
experiment in which analogues of the jnojiB and inonC genes 
from other strains are introduced into S. cinnamonensls 

■ 

likewise has the potential to produce novel oxidised 
5 polyketides. Also, the monB and monC genes or- their 

analogues may be introduced into a strain that normally 
produces a macrolide or a polyene or some other complex 
polyketide and expressed there, when they may effect the 
diversion of the growing polyketide chain on a 
10 heterologous modular PKS towards a new product, which may 
or may not have the structure of a polyether. 

The availability of the monensin gene sequence allows 
the institution of domain swaps to alter the 

15 acyltransf erase (AT) specificity of a given module, for 
example the ethylmalonyl-CoA specific extender found in 
one of the modules of the monensin PKS can be used to 
replace one of the other ATs to generate an ethyl side 
branch at that position in the chain, or the AT can be 

20 used to substitute in any other (e-g. macrolide) PKS, as 
.described in -WO »98.A01S7.'l r:.and-^*WO 38/01546.- .Similarly the 
alteration of the level of reduction in a module, by 
manipulation of the reductive enzymes, can be applied to 
the monensin genes and here it will produce, depending on 

25 which module is affected, either an altered monensin, or a 
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species which is only partly cyclised, or a polyether with 
an altered pattern of cyclisation, or even a linear 
polyketide. 

In general the targetted alteration of the pattern of 
5 substitution of sidechains or reduction levels along the 
polyketide chain produced by the monensin PKS will, like 
the disruption or deletion of the oxidative enzymes 
mentioned above, lead to non-polyether polyketide 
products. It should be possible, by introduction of the 

10 DEBS thioesterase at the C-terminus of one of the later 
modules of the monensin PKS, together with an 
appropriately placed hydroxy group earlier in the chain, 
to produce novel macrolide products from this polyether 
PKS system, or alternatively novel polyenes of defined 

15 chain length and chosen ring size. 
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Example 1 

Cloning of the monensin A biosvnthetlc aene cluster using 
DNA pro bes derived from the ervthromvcin-producing 
polvketide sy nthase of Saccharouolvsipora ervthraQa 
5 A genomic library of the monensin A producing strain 

Streptomyces cinnamonensis ATCC 15413 was constructed 
using methods well-known in the art, namely, the 
production of high molecular weight genomic DNA, followed 
by the partial cleavage of this DNA using the frequent- 

10 cutting restriction enzyme 5au3A, fractionation of the 

fragments on a sucrose gradient and selection of fragments 
of average size 35-40 kbp, and the cloning of these 
fragments into the cosmid vector pWE15 (Evans, G.A. at al- 
Gene (1989) 79:9-20) which had been previously digested 

15 with BamHl and treated with shrimp alkaline phosphatase. 

The library was packaged and transfected into Escherichia 
coli XL-1 Blue MR cells. The library was plated out on 
2xTY agar mediiom (10 g tryptone, 10 g yeast extract, 5 g 
NaCl, 15 g bactoagar per litre containing ampicillin 50 

20 /ug/ml) for cosmid selection and the colonies were allowed 
to grow overnight. "The library was then screened by 
hybridisation using as a probe DNA encoding the 
ketosynthase domain of module 1 of the erythromycin- 
producing PKS (6-deoxyerythronolide B synthase, DEBS) of 

25 Saccharopolyspora erythraea. The colonies giving a 
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positive hybridisation signal in the hybridisation were 
selected and the cosmid DNA from each colony was purified 
and mapped by restriction digestion. The presence of the 
target biosynthetic genes on a cosmid was verified by 
5 sequencing of the ends of the cosmid inserts ^sing the 

commercially available T3 and T7 primers which hybridise 
specifically to the respective ends of each cosmid insert 
(Evans, G.A. et al. Gene (1989) 79:9-20). 
Example 2 

10 Sequencing of the biosvnthetic aene cluster for monensin A 
from Strepto mvces cinnamonensis 

Three cosmids obtained by screening of the genomic 
library of S. cinnamonensis were used to obtain the entire 
DNA sequence of the monensin biosynthetic gene cluster. 

15 These cosmids, MO.CN02, MO.CNll and MO.CN33 between them 

contain the entire DNA sequence of the cluster and the 
adjacent regions of the chromosome. They have been 
deposited in NCIMB, 23 St Machair Drive, Aberdeen AB24 
3RY, UK, under the NCIMB accession numbers 40956 

20 (MO-CNll); 40957 (MO-CN33) and 40958 (MO-CN02) 

respect ivjely. 

The DNA of each cosmid was separately subjected to 
partial digestion with Sau3A and fragments of 
approximately 1.5-2.0 kbp were separated by agarose gel 

25 electrophoresis. The fragments were then ligated into the 
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plasmid vector pUC18 (Messing, 1982), previously digested 
with BamHl and treated with shrimp alkaline phosphatase. 
The library was transformed into E. coll strain XLl-Blue 
MR and plated on 2xTY agar medium containing ampicillin 
5 (100 jig/ml) to select for plasmid-containing -cells. 

Plasmid DNA was purified from individual colonies and 
sequenced using the Sanger dye-terminator procedure on an 
ABI 377 automated sequencer (Sanger, F. Science (1981) 
214:1205-1210). The sequence data obtained from single 

10 random subclones of a cosmid was assembled into a single 

continuous sequence and edited using GAP4.1 program of the 
STADEN gene analysis package (Staden, R. Molecular 
Biotechnology (1996) 5:233-241). 

The sequence is set out in the appended sequence 

15 listing. 

Tables I and II contain data about individual genes 
and gene products. 
Example 3 

Inactlvation of the monensin A biosynthetic aene cluster 
20 A chromosomal gene disruption experiment was used to 

. verify, the. -identity., of .,the cloned poly ketide .synthase gene 
cluster. Plasmid pMOB6314 is a pUCl8 sequencing subclone 
of the presumed monensin A biosynthetic gene cluster 
prepared as described in Example 1, whose inserted DNA 
25 comprises the DNA sequence from nucleotide 9763 to 
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nucleotide 10108 in SEQ ID 1, and which therefore contains 
a region of DNA wholly internal to orfE, a putative 3-0- 
methyltransf erase. A Hindi II fragment containing the 
thiostrepton resistance gene tsr from plasmid pIJ702 
(Katz, E. et al. J. Gen. Microbiol. (1983) IZ^g: 2703-2714) 
was cloned into the ffindlll site of plasmid pMOB6314 and 
the ligation mixture was used to transform E. call cells. 
Trans formants bearing the required plasmid pMOAEOl were 
identified by isolation of plasmid DNA and analysis by 
restriction digestion. pMOAEOl. Plasmid pMOAEOl was used 
to transform protoplasts of Streptomyces cinnamonensis as 
described by (Hopwood D.A. et al. (1985)). Since plasmid 
pMOAEOl lacks an origin of replication that is active in 
Streptomyces, growth in the presence of thiostrepton (25 
jig/ml) in the regeneration medium led to the isolation of 
stable integrants. Isolated putative integrants were 
tested for the presence of integrated pMOAEOl sequences by 
Southern hybridisation. A clone of Streptomyces 
cinnamonensis identified by its restriction pattern in 
Southern hybridisation as bearing pMOAEOl integrated in 
the region of monE of the monensin A biosynthetic gene 
cluster was designated 5. cinnamonensis MO-DDOl. 

Detection of production of the monensin A related 
metabolites produced by S. cinnamonensis MO-DDOl was 
performed by GC-MS analysis of methanol extracts of the 
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entire broth harvested in 72 hours of growth of the 
strain. No significant amounts of monensin A-related 
metabolite production were detectable. 

Example 4 

5 Overproduction of erythromycin aalvcone in Sizreotomyces 
cinnamonensis 

S. cinnamonensis is a suitable system for 
overproduction not just of monensin A but also of other 
polyketide metabolites. Established techniques of genetic 

10 transformation allow fast introduction of foreign 

polyketide producing genes sets into this host. Fast 
growth of S. cinnamonensis in liquid culture and optimal 
precursor supply favour high yield of polyketide 
metabolites . 

15 Construction of pIB061 

S. erythraea NRRL2338 was transformed with pCJR30 
(Rowe, C. J,, et al. (1998) Gene 216:215-223) using a 
routine protoplast transformation technique as described 
by Hopwood et al. (1985). A stable integrant of S. 

20 erythraea [pCJR30] was identified and the production of 

lOmg/L of the triketide lactone (delta lactone of 
(2S, 3R, 4R, 5R) -2, 4-dimethyl-3, 5-dihydroxy--heptanoic acid) 
in addition to erythromycins was confirmed by MS 
analysis. 

25 Total DNA of S. erythraea [pCJRBO] was purified and 



- 42 



wo 01/68867 



PCT/GBOO/02072 



approximately 200 ng was digested with EcoRl endonuclease. 
The digestion mixture was precipitated with isopropanol 
and the resulting DNA was treated with T4 DNA-ligase for 
16 hours at 16°C. The ligation mixture was used to 
5 transform E.coli DHIOB cells. The transf ormants were 
screened for the presence of the plasmid, A clone 
containing a 44,7kb plasmid was identified and confirmed 
by restriction analysis to contain three complete genes: 
eryAI, eryAII and eryAIII. The plasmid was named pIB061. 

10 Transformation of S. cinnamonensls 

Protoplasts of S. cinnamonensls were prepared by a 
modified procedure of Hopwood et ai. (1985)- Plasmid 
pIBOei was transformed into the protoplasts of S. 
cinnamonensls and stable thiostrepton resistant colonies 

15 were isolated. Individual colonies were checked for their 
plasmid content and the presence of plasmid pIB061 was 
confirmed by its restriction pattern. S. cinnamonensls 
(pIBOei) was inoculated into 250 ml of M-C3 minimal 
production medium containing 10 /zg/ml of thiostrepton and 

20 allowed to grow for 72 hours at 30 °C. After this time the 

mycelia were removed by filtering. The broth was extracted 
with two volumes of ethyl acetate and the combined ethyl 
acetate extracts were washed with an equal volume of 
saturated sodium chloride, dried over anhydrous sodium 

25 sulphate, and the ethyl acetate was removed under reduced 
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pressure to give about 200 mg of crude product- The 
product was analysed by LCQ and mass was confirmed to that 
of erythronolide B. 

This example demonstrates the importance of 5. 
cinnamonensis for production of high levels of foreign 
polyketide antibiotics. Introduction of the complete 
erythromycin gene cluster or other gene clusters into this 
system are likely to produce high levels of the 
corresponding metabolites. 
Example 5 

Construction of plasmid p CJW58 containing the monensin 
activator aene under the ermE* promoter 

The ermE* promoter derived from the ermE resistance 
methyltransf erase gene of S. erythraea (Bibb et al. Gene 
(1985) 38:215-226) was amplified by PGR as a Spel-Xbal 
fragment using the following oligonucleotides 
5 ' -CCACTAGTATGCATGCGAGTGTCCGTTCGAGT-3 ' and 5 ' - 
TTGTATACACCTAGGATGGTTGGCCGTGC-3* with pRH3 (Dhillon et al. 
Molecular Microbiology (1989) 3:1405-1414 as a template 
and cloned into Sinal-digested, phosphatase-treated pUC18, 
to produce plasmid pIB135. The integrative plasmid pSET152 
(Bierman, M. et al. (1992) Gene 116:43-49)) was digested 
with Xbal and the backbone was dephosphorylated and 
ligated to the Spel-Xbal fragment of pIB135 containing the 
ermE-^ promoter. The ligation mixture was used to 
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transform E. coll DHIOB and the orientation of the insert 
in the plasmids from individual clones was checked by 
using restriction analysis- A plasmid with the eraE* 
promoter oriented so that the Ndel and Xhal sites are 
5 adjacent to the apramycin resistance gene was- selected and 
named pIB139. 

The monR gene from the monensin biosynthetic gene 
cluster was amplified and Ndel and Xbal restriction sites 
introduced at 5' and 3' ends respectively, by PGR using as 

10 primers the following oligonucleotides: 

5'-AGA TAG CAT ATG CTG GGC CCG CTC CGC AT -3' 
and 5'-AAT GCT CTA GAG TGT GAG GGA GGG GAG AGG GGG AA-3' 
and cosmid MO.GNll as template*. The PGR product was 
ligated into Smal-treated and phosphatase-treated plasmid 

15 pUG18 and the ligation mixture was used to transform E. 

coll DHlOB cells. Transformant colonies were analysed for 
the presence of plasmid and the identity of the plasmid 
inserts was verified by sequencing. A plasmid whose 
insert contained the monR gene flanked by Ndel and Xbal 

20 restriction sites was selected and designated pGJW57. 

Plasmid pGJW57-was-digest*ed with Ndel ■ and Xbal and 
the fragment containing the monR gene was ligated together 
with the backbone of plasmid pIB139 which had been 
digested with the same two restriction enzymes, and 

25 purified by gel elution. The ligation mixture was used to 
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transform E. coll strain DHIOB cells. Transf ormant 
colonies were analysed for the presence of plasmid and the 
identity of the plasmid inserts was verified by 
restriction analysis- One such recombinant was selected 
5 and named plasmid pCJW58. 

Plasmid pCJW58 was used to transform the methylation- 
deficient coll strain ET 12567 (MacNeil D. J. et al. 
(1992) Gene 111:61-68) and the recovered, unmethylated 
plasmid was then used to transform the same E. coll strain 

10 ET12567 housing the plasmid pUB307, a derivative of RP4 
which is mob" and which contains a gene for kanamycin 
resistance (Piffaretti, J. C. et al. (1988) Mol. Gen. 
Genet. 212:215-218). Recombinants were plated on 2 x TY 
agar medium containing apramycin and kanamycin at final 

15 concentrations of 50 micrograms per ml and 50 micrograms 
per ml respectively. The plasmid content of recombinants 
was checked isolation of plasmid DNA and checking of the 
identity of these plasmids by restriction analysis. One 
such clone which contained both pUB307 and plasmid pCJW58 

20 was selected and used for further experiments. 

Construction of Streptomyces clnnamonensis (pCJW58) 
and production of monensins 

A single colony of E. coll ET12567 housing both 
pUB307 and pCJW58 was toothpicked into 3 ml of TY liquid 

25 medium, containing apramycin and kanamycin at 25 and 25 
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micrograms respectively, and grown overnight at 37*^0. This 
culture was used to inoculate 25 ml of TY medium, 
supplemented with the same antibiotics at the same 
concentrations, and growth was continued until the 
5 absorbance at 600 nm (1 cm pathlength) was between 0,3- 
0.6. The cells were centrifuged (room temperature, 7 
minutes, 2000 x g) , resuspended in TY liquid medium (10 
ml) containing no added antibiotics, re-centrifuged as 
before, then resuspended in 2ml of TSB medium and placed 

10 on ice- Meanwhile, 0.5 ml of TSB medium was added to 100 
microL containing approximately 10^ spores of S. 
cinnamonensis. After a brief heat shock, at 50 °C for 10 
minutes, the suspension was briefly cooled, mixed with 
0.5 ml of donor E. coli cells, and plated on solid A 

15 medium, which has composition as follows: 



Sigma wheat starch 5g 

Corn steep powder 1.25g 

20 Yeast extract 1.5g 

CaCOa i .-5g 

FeS04 6 mg 

DIFCO agar lOg 

H2O to 500 ml 



25 pH adjusted to pH 7 with KOH. 
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And to Which in addition was added 10 iciM MgClj to a 
final concentration of 10 mM. 

The plates were allowed to dry overnight at room 
5 temperature, and were then allowed to incubatie a further 
18 hours at SO^'C, After this time each 25 ml plate was 

■ 

overlaid with a solution of apramycin (final concentration 
50 micrograms per ml) and nalidixic acid (final 
concentration 20 micrograms per ml), and the plates were 

10 allowed to incubate for four days at 30°C. At this time 
individual colonies were toothpicked onto solid A medium 
and allowed to grow. Four representative colonies from 
the A medium plate were grown up in liquid modified YEME 
medium, which has composition as follows: 

15 Modified YEME medium 

Sucrose lOOg 
DIFCO Yeast extract 3g 
Bacto peptone 5g 
Oxoid Malt extract 3g 

20 Glucose lOg 
•HjO to IL 

pH adjusted to pH 1.2 with NaOH, 

These cultures were used to provide a 2% vol/vol 
inoculum for 30 ml of modified YEME which was grown for 7 
25 days, and then transferred to SMI 6 medium, which has 
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composition as follows: 
SMI 6 medium 



3- [N-Morpholino] -propane sulfonic acid 

(MOPS) buffer -20.99 
L-proline 10. og 

Glucose 



NaCl 



20g 
0.5g 

K2HPO4 2 . Ig 

« 

10 Ethylenediaminetetraacetic acid, sodium 

salt 0.25g 
MgS04.7H20 0,49g 
CaCl2,2H20 0.029g 
Trace elements solution (Hopwood, 

15 A. et al. (1985) Genetic Manipulation 

of Streptomyces - a Laboratory Manual, 
at p. 235) 2 ml 

0.5 M C0CI2 solution 2 microlitres 

H2O to IL 

20 pH adjusted to pH 7 with NaOH. 

After -growth -for a further 7- days; -myceilxim was 
collected by centrifugation at 2000 x g for 30 minutes, 
and the supernatant was extracted three times with 300 ml 
of ethyl acetate. The combined extracts were concentrated 

25 by evaporation under reduced pressure to an oil, which was 
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mixed with 1 ml of methanol. Samples were applied to an 
LCQ liquid chromatograph fitted with a mass spectrometer 
detector unit. The column used was a C18 reversed phase 
column, equilibrated with a mixture of 80% 20mM ammonium 
5 acetate/20% acetonitrile, and the column was -eluted with a 
gradient of increasing acetonitrile, reaching 100% 
acetonitrile over 24 minutes. Monensins A and B emerged 
from the column with retention times respectively of 8.2 
minutes and 9.2 minutes. The relative amounts of monensin 
10 produced by three independent clones (A-C) containing an 
additional copy of the laonR gene were compared to a 
control fermentation of the wild type S. cinnamonensis 
strain, with the results shown in the Table below: 



15 



Table showing increased monensin production in strains 
Joearjnq additional copy of xaonR aene 



Strain 



monensin A 



monensin B 



concentration 



concentration 



(arbitrary units) (arbitrary units) 



20 



Control 

A 

B 

C 

Example 6 



188 
430 
•450 
249 



861 
1 800 

..... .•■1-300 

1 300 



25 



Construction of cinnamonensis M12AT5 

A region lying immediately 5' of the DNA encoding the 



- 50 - 



wo 01/68867 



PCT/GBOO/02072 



acyltransf erase (AT12) domain of module 12 of the monensin 
polyketide synthase in the monensin biosynthetic gene 
cluster was amplified with the following primers: 5'- 
GGTGGCCACGGAAACACCAACACCGGACCCGCGCC-3', and 
CTCTCGGAGGCCCGGCGCAACGGCCACAA-3', 3' using cosmid MO-CNll 
as a template- The PGR product was ligated into Smal 
digested and phosphatase-treated plasmid pUC18 and the 
ligation mixture was used to transform E. coli DHIOB 
cells. Transformant colonies were analysed for the 
presence of plasmid and the identity of the plasmid 
inserts was verified by sequencing. A plasmid whose 
insert contained a fragment upstream of the AT12-encoding 
sequence from about 82.3kb to 83.2kb of the man cluster 
was designated pMOSl. Similarly a region lying immediately 
3' of the DNA encoding the acyltransf erase {AT12) domain 
of module 12 of the monensin polyketide synthase in the 
monensin biosynthetic gene cluster was amplified with the 
following primers: 5' -GGCCTAGGGCTGCCTCGGGTGGTGGATCTGCCGA- 
3' and 5'- TGGTCGGGCGCGGTGCGTGCGATACGT-3^ using cosmid 
MO-CNll as a template. The PGR product was ligated into 
, vSraal-ireated -and-.dephosph0r*yiated pUCie and the ligation 
mixture was used to transform DHIOB E.coli cells. 
Transformant colonies were analysed for the presence of 
plasmid and the identity of the plasmid inserts was 
verified by sequencing. A plasmid whose insert contained 
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a fragment downstream of the AT12-enGoding sequence, from 
80.5kb to 81.4kb of the mon cluster, was designated pM082, 

The DNA encoding AT of module 5 was amplified and 
Mscl and Avrll restriction enzyme recognition sites were 
5 introduced at the ends by PGR using the following primers: 
5'-CCTGGCCAGGGCGGCCAGTGGGTGGGCATG-3' and 5'- 
GGCCTAGGGGTCGGCCGGGAACCAGCGCCGCCAGT-3' and the cosmid MO- 
CN33 as a template. The PGR product was ligated into Smal- 
treated and dephosphorylated pDC18 and the ligation 

10 mixture was used to transform DHIOB E.coli cells. 

Transformant colonies were analysed for the presence of 
plasmid and the identity of the plasmid inserts was 
verified by sequencing. A plasmid whose insert DNA, with 
sequence from about 44.2kb to 45.2kb of the mon cluster, 

15 encoded the ATS domain was designated pM083. 

pM081 was digested with Mscl and JJindlll and ligated 
to the 0.9kb Mscl - Hindlll fragment of pM082. A clone 
containing both fragments was designated pM084. Plasmid 
pM084 was cleaved with Avrll and ifindlll, treated with 

20 phosphatase, and ligated together with the 1.0 kb Avrll - 
HindlJl fragment of pldOSS to produce pM085, which contains 
the DNA encoding the ATS domain flanked by DNA from either 
side of the DNA encoding the AT12 domain of the monensin 
PKS. The thiostrepton resistance gene tsr, derived from 

25 plasmid pIJ702 (Katz, E. et al., J. Gen. Microbiol, 
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1983), was cloned into the Hijidlll site of pM085. The 
resulting plasmid pM086 was analysed by its restriction 
pattern and confirmed to contain all the desired 
elements. 

5 Plasmid pM086 was used to transform S. ainnamonensis 

protoplasts as described by Hopwood, D- A. (1985). Stable 
thiostrepton-resistant transf ormants were isolated and 
checked for the desired integration of the pM085 into the 
AT12 flanking regions by Southern blot hybridisation. One 
10 such integrant, S. cinnamonensis MO-08, containing pM085 
integrated upstream of the AT12, was passed through 4 

« 

cycles of sporulation on a non-selective nutrient 
medium. Spores obtained after the fourth cycle were 
replica-plated onto media with and without thiostrepton. 
15 DNA of clones that had lost thiostrepton resistance was 

analysed by Southern blot hybridisation. Clones in which 
the DNA encoding the AT12 domain had been replace by the 
DNA encoding the ATS domain was designated S. 
cinnamonensis M12-AT5. At this time individual colonies 
20 were toothpicked onto solid A medium and allowed to grow. 
' Pour -representative ^eoionies from the A -medium plate were 
grown up in liquid modified YEME medium, which has 
composition as follows: 
Modified YEME medium 




25 
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Sucrose lOOg 

DIFCO Yeast extract 3g 

Bacto peptone 5g 

Oxoid Malt extract 3g 

5 Glucose lOg 

H2O to IL 



pH adjusted to pH 7.2 with NaOH. 

These cultures were used to provide a 2% vol/vol 
inoculum for 30 ml of modified YEME which was grown for 7 
days, and then transferred to SMI 6 medium, which has 
composition as follows: 
SMI 6 medium 

3- [N-Morpholino] -propane sulfonic 



15 acid (MOPS) buffer 20. 9g 

L-proline 10. Og 

Glucose 20g 

NaCl 0.5g 
K2HPO4 ^ 2 . Ig 

20 Ethylenediaminetetraacetic acid, 

sodium salt 0.,25g 

MgS04.7H20 0.49g 

CaCl2.2H20 t 0,029g 



Trace elements solution (Hopwood, 
D. A, et al. (1985) Genetic 
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Manipulation of Streptomyces - a 

Laboratory Manual, at p. 235) 2 ml 

0.5 M C0CI2 solution 2 microlitres 

H2O to IL 

5 pH adjusted to pH 7 with NaOH. 

After growth for a further 7 days, mycelium was 
collected by centrifugation at 2000 x g for 30 minutes, 
and the supernatant was extracted three times with 300 ml 
of ethyl acetate. To confirm presence of the C-2-ethyl 

10 substituents of both monensin A and B the combined 

extracts were concentrated by evaporation under reduced 
pressure to an oil, which was mixed with 1 ml of methanol. 
Samples were applied to an LCQ liquid chromatograph fitted 
with a mass spectrometer detector unit. The column used 

15 was a CI 8 reversed phase column, equilibrated with a 

mixture of 80% 20mM ammonium acetate/20% acetonitrile, and 
the column was eluted with a gradient of increasing 
acetonitrile, reaching 100% acetonitrile over 24 minutes. 
Mass ions 14 mass units above those expected for both 

20 monensin A and B confirmed production of the respective C- 
2^et'hyl 'substituents. 

Example 7. Construction of pSGKOOS and its use in the 
production of C-13 propyl-erythromvcin 

Plasmid pSGKOOS is a pCJR24 based plasmid containing 
25 a PKS gene comprising a loading module plus the first and 
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second extension modules and the chain terminating 
thioesterase of the PKS responsible for the production of 
erythromycin (DEBS) . The loading module comprises the KS 
and ethyl -malonyl CoA specific AT from module 5 of the 
monensin PKS linked to the DEBS loading ACP domain. In 
addition, the active site cysteine of this module 5 KS has 
been mutated to glutamine to convert an extender di -domain 
to a loading di-domain. Plasmid pSGKOOS was constructed 
as follows. 

A 2769bp DNA segment of the monensin cluster of S. 
cinnamonensis extending from nucleotide 42438 to 45207 was 
amplified by PGR using the following oligonucleotide 
primers . 5 ' -GTGACGTCATATGTCGAGTGCTGAAGAGTCG-3 ' and 
5 ' -GGGGTCGCCTAGGAACCAGCGCCGCCAGTCGA-3 ' 

The design of these primers introduced Nde I and Avr 
II sites at the ends of the amplifed fragment. Monensin 
cosmid 05 was used as a template for the reaction. The 
resulting 2769bp fragment was digested with Nde I and Xho 
I and a 656bp fragment (Fragment A) purified by 
preparative gel electrophoresis. 

A . second,. P£R.reac.ti£>n-JMas used.,.,wlth -the .same template 
to amplify DNA from nucleotide 43098 to 45207'. The 
primers used were 

5 ' -CGGCCTCGAGGGCCCGTCGGTCAGTGTCGACACGGCGCAGTCCTCCTCGC-3 ' 
and 5 ' -GGGGTCGCCTAGGAACCAGCGCCGCCAGTCGA-3 ' 
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The design of the upstream oligonucleotide primer 
incorporated a change of the codon specifying the KS 
active site cysteine (nucleotides 43135-43137, TGC) to 
glutamine (CAG) . The resulting 2109bp DNA fragment 
(Fragment B) was digested with Xho I and Avr -^I and 
purified by preparative gel electrophoresis. 

Plasmid pCJWSO is derived from pCJR24 and DEBSl-TE in 
which Msc I and Avr II sites have been introduced to flank 
the AT of the DEBS loading module. This plasmid was 
digested with Nde 1 and Avr II and the larger fragment 
(Fragment C) purified by preparative gel electrophoresis. 

The three fragments (Fragments A, B, C) were ligated 
together using T4 DNA ligase and the ligation mixture used 
to transform electrocompetent E. coli DHIOB cells. 
Individual clones were checked for the presence of the 
desired plasmid pSGKOOS. The identity of pSGKOOS was 
confirmed by restriction pattern and sequence analysis. 

Plasmid pSGKOOS was used to transform S. erythraea 
NRRL2338 using a routine protoplast transformation 
technique. Thiostrepton resistant colonies were selected 
•on. .R.2T20 media containing g/ml thiostrepton. Further 
analysis confirmed that pSGKOOS had integrated into the S. 
erythraea NRRL2338 chromosome by Southern blot 
hybridisation of their genomic DNA with DIG-labelled DNA 
containing the actll orf4 promoter. The culture S. 
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erythraea NRRL2338 (pSGKOOS) was inoculated into 5ml tap 
water mediinn in a 30ml flask. After three days 
incubation at IS'^C this flask was used to inoculate 30ml of 
Ery-P medium in a 300ml flask. The broth was incubated at 
5 29°C at 200rpm for 6 days. After this time the whole broth 
was adjusted to pH8,5 with NaOH, and then extracted twice 
with an equal volume of ethyl acetate- The ethyl acetate 
extract was evaporated to dryness at 45°C under a nitrogen 
stream using a Zymark Turbovap LV evaporator. The product 
10 identities were confirmed by LC/MS. A peak was observed 

with a m/z value of 734 (M+H)-*" required for erythromycin A. 
A second peak was observed with a m/z value of 748 (M+H)"^, 
required for 13-propyl erythromycin A. 
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TABLE I 



gene function 

gdhA glutamate dehydrogenase (partial) 
dapA dihydrodlpicolinate synthase 
orf3 putative transcriptional activator 
orf4 hypothetical protein 
orfS hypothetical protein 
orf6 hypothetical protein 
orf7 hypothetical protein 
acpX acyl can-ler protein 
ksX ketoacyl synthase 
monCI probable epoxihydrolase/cyclase 
monE methyltransferase 
monT monensin resistance gene (ABC- 
monRi probable repressor 
monAI thioesterase 
monAl poiyketide synthase loading & 
KS-L 

AT-L malonate specific 
ACP-L 

KS1 

ATI methylnialonate specific 
DH1 (potential) 
KR1 (inactive) 
ACPI 

monAl poiyketide synthase module 2 
KS2 

AT2 methylmalonate specific 

DH2 

ER2 . 

KR2 

ACP2 

monAl poiyketide synthase modules 3 & 4 
KS3 

AT3 malonate specific 

DH3 

KR3 

ACP3 

KS4 

AT4 methylmalonate specific 

DH4 

ER4 

KR4 

ACP4 

monAl poiyketide synthase modules 5 & 6 
KS5 

ATS -ethylmalonate specific 

DH5 
KR5 
ACP5 



start 

1038 
2140 
2211 
3264 
4307 
4570- 
5058 
6010 
8531 
9542 
10426 

10656 
12205 

13829 
14121 
14172 
15777 
17019 
17358 
18960 
20019 
21636 
22536 
23205 
23307 
• 24891 
25953 
27600 
28485 
29313 
29974 
30076 
31798 
32884 
34692 
35553 
35899 
37489 
38557 
40123 
41005 
41848 
42448 
42^8 
44221 
45289 
46785 
47593 



end 

0 

1220 
3152 
3680 
3684 
. 4758 
5612 
5693 
6045 
8643 
9596 

12191 
12780 
13023 
23198 
15486 
16880 
17276 
18626 
19976 
20519 
22241 
22793 
29921 
24569 
25913 
26369 
28463 
29042 
29570 
42372 
31347 
32838 
33465 
35181 
36811 
37170 
38511 
38982 
40986 
41562 
42105 
54564 
43890 
45243 
45744 
47337 
47850 
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KS6 

AT6 malonate specific 

DH6 

ER6 

KR6 

ACP6 

monA polyketide synthase modules 7 & 8 
KS7 

AT7 methylmalonate specific 

DH7 

KR7 

ACP7 

KS8 

ATS malonate specific 

DH8 

ER8 

KR8 

ACP8 

monA polylcetide synthase module 9 
KS9 

AT9 malonate specific 
KR9 (potential) 
ACP9 

monH probable regulator 
monCI FAD containing epoxidase 
monBI double t)ond isomerase 
monBI double bond Isomerase 
monA polyketide synthase modules 1 1 & 
KS11 

AT11 methylmalonate specific 

KR11 

ACP11 

KS12 

AT12 methylmalonate specific 

DH1 2 (potential) delta 

ER1 2 (potential) 

KR12 

ACP12 

monA polyketide synthase module 1 0 
KS10 

AT10 methylmalonate specific 

KR10 

AGP10 
monD P450 oxygenase 
monRI probable activator 
monA ihioesterase 
or^9 cell wall biosynthesis capK 
lipB lipase B 
orf31 ion pump 

orf32 membrane stnjctural protein 
amtA glycine amidinotransferase 
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^nR44 
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S4052 
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'viRl4 


66934 
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55978 

www f w 


OOOl/U 


57319 




57802 
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60124 

WW 1 *»~ 




61453 

\J 1 ~ w w 


61808 


62839 


62882 


63316 


64577 


65437 

^^^^ • 


65456 


66016 


66404 


66661 
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72054 


67075 


68340 


6fi6Qfi 


69729 

Ww f £i9 


(\Jl OO 


71262 


# lOOO 


71783 

1 1 f Ow 




74003 


7RR4'l 






76538 


774'>n 


77016 


DOf wQ 


77447 
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87344 
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85993 
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84562 
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82604 


04004 


81?)85 




808*15 


7Q61ft 


78914 


7AftO^ 


78337 


70070 

f OVIf V/ 


77812 


Q^741 


88816 




92368 

W«i Www 




91021 




89584 




89068 

Owl7W 


94081 


95273 


96141 


95338 


96941 


96138 


97580 


98953 


99983 


98991 


101433 


100507 


102581 


101490 


102924 


103450 
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TABLE n 

GdhA, glutamate dehydrogenase (partial coding sequence) Length: 346 
amino acids 

1 LTTRPDTKTA LSQKTALSQL LTEIEHRNPA QPEFHQAARE VLETLAPVIA 
51 ARPEYAEAGL lERLCEPERQ IVFRVPWQDD HGRVRVNRGF RVEFNSALGP 
101 YKGGLRFHPS VNLGVIKFLG FEQIFKNALT GLGIGGGKGG SDFDPRGRSD 
151 AEVMRFCQSF • MTELYRHIGE HTDVPAGDIG VGGREIGYLF GQYRRITNRW 
201 EAGVLTGKGR NWGGSLIRPE ATOYGNVLFA AAMLRERGET LEGRTAWSG 
251 SGNVAIYTIQ KLAALGANAV TCSDSSGYW DEKGIDLDLL KQVKBYBRAR 
301 VDTYAQRRQA SARFVPGRRV WEVPADIALP SATQNELDAD DATALI 

DapA, dihydrodopicolinate synthase Length: 307 amino acids 

1 MTLASSLEPT TEPLFNGLYV PLVTPFTDDL RLAPBALARL ADEALSAGAS 

51 GLVALGTTAE AATLTAEERE TVIRVCSAAC RAHGAPLIVG VGTNDTATAI 

101 TALRELAARG DVAAALVPAP PYIRPGEAQT LAHFAALAEH GGLPLWYDI 

151 PYRTGQTI.GA GTITALGRLP EWGIKHATG SIDPTTMELL DSPLPGFAVL 

201 GGDDIVLSPL VAAGAHGGIV ASANLRTADY ABMIALWRRO SAAPARALGA 

251 DLARLSAALF TEPNPTVIKG VLHAC2NRIPS PAVRMPLLAA SADSVRRAAP 

301 LAASRK* 

ORF3, putative transcriptional activator protein Length: 314 amino acids 

1 MLDVRRLHLL.' RELDRRGTIA AVABALTFTA SAVSQQLGVL EREAGVPLLB 

51 RSGRRWLTP AGRSLVAHAD AVLNRLEQAV AELAGARDGI GGPLRIGTPP 

101 SGGHTIVPGA LAELASRHPA LEPMVREIDS ARVSDGLRAG ELDVALVHDY- 

151 DFVPATPDTT VDBVPLLEEP MYLVTHAADT ATDSGSGSTL AALL6PCABV 

201 PWITARDGTT GHAMAVRAGQ AAGFQPRIRH QVNDFRTVLA LVAAGQGAGF 

251 VPRMAASPSP AGWLTKLPL FRRSKVAFRA GGGAHPAIAA FVAAATTAVE 
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301 RMAGSRGPAG GSE* 

ORF4, hypothetical protein Length: 139 amino acids 

1 MADDAYLFLL PDRHPRLGAA LAAVGALECT ETPAVHAWLQ AHEASVSSEQ 
51 VRILPADABT LIPiODAERLP VPLSEEEALK VEQECAPQTV TDMESELLAF 
101 RETTQDWQAL VHRALTAGIP AQRIARLTGL DPEEIGRL* 

0RF5, hypothetical protein Length: 208 amino acids 

1 LAVAACAAW LPIDAWRIS AADVGVLVFF AYLLPYLAIT MTVFVSVAPE 
51 QVRSWARREA RGTFLQRYVL GTAPGPGGSL FIAAAALWA VLWLPGHLST 
101 TFSALPRTLV ALALWAAWI CVWAFAVTF QADNLVENER ALEFPGBRSP 
151 AWADYVYFAL AAMTTPGTTD VDVTSRDMRR TVAANTVIAF VFNTVTVAIL 
201 VSALGGR* 

0RF6, hypothetical protein Length: 63 amino acids 

. 1 MTVMDKLKQM LKGHEDKAGQ GIDKAGDFVD GKTQGKYSGQ VDTAQDKLRD 
51 QFGSDQQEPP QR* 

* 

ORF7, hypothetical protein Length: 185 amino acids 

1 MGTAQSQEQA AAPGACAAFV RFVLCGGGVG LASSFAWAL ASWVPWALAN 

51 ALVAWSTW ATELHARFTF GAGGRATWRQ HAQSAGSAAA AYAVTCVAMF 

101 VLQQLVAAPG AVLEQWYLS ASALAGVARF WLRLWPAR NRSLPAAAAV 

151 RTARPVRRVP APVPATVAHA ASRPAGPAAL CPAA* 

AcpX, acyl carrier protein (ACP) Length: 106 amino acids 

1 MTSTDHTSGQ DATELEKQLA AATPEEREKL LTDTIRTQAG TLLNTTLSDD 
51 SNFLENGIiNS LTAIBLTiCTL MTLTGMEIAM VAIVBNPTPA QLAHHLGQEL 
101 AHTTA* 
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KsX, ketoacyl-ACP synthase Length: 829 amino adds 

1 VANEEKLVEY LKWTTl^ELHQ AQQQLRELKA AQHEPIAWS MACRLPGKTR 

51 TPDDLWDLVS EGRDAVTGFP DDRAWELPEE RPYAELGGFL DDAAGFDAGP 

101 FDISDTEAVA TEPLQRLMLH LAWETVERGH lAPHTLRSTL TGVYVGATGH 

151 DYATRLETAP DELLPYLGGG TSGSLVSGRI AYALGLEGPA JSVDTACSSS 

201 LVALHLACQA LRRGECGLAL AGGGTVMSTP HTFHAFAHQK SLAQDGRCKP 

251 FAAAADGMGL GEGVGLVLLE RLGDARKNGH PVLAVIRQSA VNQDGAGYGL 

301 AAPNGPSQQH VIRA^LADAG LTPDQIDAVE AHGTGTPIGD AIEVQALLAT 

351 YGADRSPDRP LWLGSVKSNT GHTQGAAGAA ALIKMVQAFR HGTLPPTLHV 

401 DRPTPLAAWK KGAVRLLTEA VDWPRREEPR RVGISAFATS GTNAHLILEE 

451 PPVDEAPVPD AARDQTSPVA PELPVAWSLS ARTPEALRAQ AKALVTHLAA 

501 TDPAPSPAEV AYSLAATRSP LEHRAVLTGT DHTELLAAAR ALAAGEDHPD 

551 LVRSTPGA6P KKIAWHFDGR PADGVTTGAA PGAKPGATFG ATFGAAFGGA 

601 EFHSAFPLFA SAFDEARALL DTHLPTPLPT PHSELARFAV HTALARLLLE 

651 TGVRPHTLTG DGVGHIAAAY AAGILTLDDA CRLAAAHAAA AQAAEGEQPA 

701 PPDAYEPVLK QLTFQRATLT LTSTAPADTP lASADYWHHH LTSPAPTAPP 

751 TPETHTLLHL GALSPEGTQT SAVSALLTAL ARLHTTGGTV DWTPLVRRTP 

801 HPRTIIDLPTY SFQATRYWLH DHTAHAAV* 

MonCn, probable epoxyhydrolase/cyclase Length: 300 amino acids 

1 VKNLRIPVSQ TVSLNVRYRP ADGPGAPGRP FLLLHGMLSN AEMWDEVAAR 
51 LAAAGHPAYA VDHRGHGESD TPPDGYDNAT WTDLVAAVT ALDLSGALVA 
101 GHSWGAHLAL RLAABHPDLV AGLALIDGGW YEFDGPVMRA FWBRTADWR 

■ 

151 RAQQGTTSAA DMRAYMIATH PDWSPTSIBA RLADYRVGPD GLLIPRLTST 
201 QVMSIVAGLQ REAPADWYPK VTVPVRLLPL IPAIPQLSDQ VRAWVAAAEA 
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251 ALEQVSVRWY PGSDHDLHA.G APDEIAADLL LLARSCEAMP GGKAGVRPA* 

MoiiE, S-adeonosylmethionine-dependent methyltransferase Length: 277 
amino acids 

1 VNKTVAPEPS DIGHYYDHKV FDLMTQLGDG NLHYGYWFDG GEQQATFDEA 
51 MVQMTDEMIR RLDPAPGDRV LDIGCGNGTP AMQLARARDV EWGISVSAR 
101 QVERGNRRAR EAGLADRVRF EQVDAMNLPF DDGSFDHCWA LESMLHMPDK 
151 QQVLTEAHRV VKPGARMPIA DMVYUSTPDPS RPRTATVSDT TIYAALTDIG 
201 DYPDIFRAAG WTVLELTDIT RETAKTYDGY VEWIRAHRDE YVDIIGVEGY 
251 ELFLHNQAAL GKMPELGYIF ATAQRP* 

MonT, putative monensin res^istance gene (ABC-transporter) Length: 512 
amino acids 



1 


MSADLGARRW 


WAVGALVLAS 


MWGFDVTIL 


SItALPAMADD 


LGANNVELQW 


51 


FVTSYTLVFA 


AGMIPAGMLG 


DRFGRKKVLL 


TALVIFGIAS 


LACAYATSSG 


101 


TFIGARAVLG 


LGAALIMPTT 


LSLLPVMFSD 


EERPKAIGAV 


AGAAMLAYPL 


151 . 


GPILGGYLLN 


HFWWGSVFLI 


r 

NVPWILAFL 


AVSAWLPESK 


AKEAKPFDIG 


201 


GLVFSSVGLA 


ALTYGVIQGG 


EKGWTDVTTL 


VPCIGGLLAL 


VLFWWBKRV 


251 


ADPLVDLSLF 


RSARFTSGTM 


LGTVINFTMF 


GVLFTMPQYY 


QAVLGTDAMG 


301 


SGFRLLFMVG 


GLLVGVTVAN 


KVAKALGPKT 


AVGIGFALLA AALFYGATTD 


351 


VSSGTGLAAA 


WTAAYGLGLG 


lALPTAMDAA 


LGALSEDSAG 


VGSGVNQSIR 


401 


TLGGSFGAAI 


LGSILNSGYR 


GKLDLDGVPE 


QAHGAVKDSy FGGLAVARAI 


451 


KSNGLiADSVR 


SAYVHALDW 


LWSGGLGLL 


GWLAWWLP 


RHVGQSTAKT 


501 


AESEHEAADA 


V* 









i ' ' 

MonRII, probable repressor protein Length: 192 amino acids 

1 VPGLRERKKA RTKAAICREA VRLf^iEQGYT ATTIEQIAEA AEVAPSTVFR 
51 yPATKQDLVF SHDYDLPFAM MVQAQSPDLT PIQAERQAIR SMLQDISEQE 
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101 LALQRERFVL ILSEPELWGA SLGNIGQTMQ IMSEQVAKRA GRDPRDPAVR 
151 AYTGAVFGVM LQVSMDWAND PDMDFATTLD EALHYLEDLR P* • 

MonAIX, thioesterase Length: 269 amino acids 

1 MDRGTAARAP QIGDEFGAAT GNGVWLRRYH AAAEAPVRLV CFPFAGGSAS 
51 YYPGLSGLLA PGVEVLAVQY PGRQDRHAEP CLASVAELAD iJVVPHLPCDG 
101 KPFALFGHSL GAIVAFEVAR RLRGPAGPGL PVHLFVSGGL ARPYRPAGRS 
151 GAFGDADILA HLRAMGGTDE^RFFRSPELQE LVLPALRADY RAVATYEAPG 
201 PGRLDCPITA LIGDADERTS PEQAATWRER TGAAFDLRVL PGGHFYLDGC 
251 QEQVAAWTE ALTAGPGV* 

MonAI, polyketide synthase multi-enzyme MONSl, housing loading module 
and extension module 1 Length: 3026 amino acids 

1 MAASASASPS GPSAGPDPIA WGMACRLPG APDPDAFWRL LSEGRSAVST 

51 APPERRRADS GLHGPGGYLD RIDGFDADFF HISPREAVAM DPQQRLLLEL 

- 10 i SWEALEDAGI RPPTLARSRT GVFVGAFWDD YTDVLNLRAP GAVTRHTMTG ' 

151 VHRSILANRI SYAYHLAGPS LTVDTAQSSS LVAVHLACES IRSGDSDIAP 

201 AGGVNLICSP RTTELAAARF GGLSAAGRCH TFDARADGFV RGBGGGLWL 

251 KPLAAARRDO DTVYCVIRGS AVNSDGTTDG ITLPSGQAQQ DWRLACRRA 

301 RITPDQVQYV ELHGTGTPVG DPIEAAALGA ALGQDAARAV PLAVGSAKTN 

351 VGHLEAAA6I VGLLKTALSI HHRRLAPSLN FTTPNPAIPL ADLGLTVQQD 

401 LADWPRPEQP LIAGVSSFGM GGTNGHWVA AAPDSVAVPE PVGVPERVEV . " . 

451 PEPVWSEPV WPTPWPVSA HSASALRAQA GRLRTHLAAH RPTPDAARVG 

501 HALATTRAPL "aHRAVLLGGD TAELLGSLDA LABGAETASI VRGBAYTEGR 

551 TAFLFSGQGA ^RLGMGRELY AVFPVFADAL D^FAALDVH LDRPLREIVL ••■ 

601 GETDSGtaJVS GBNVIGBGAD HQALLDQTAY TQPALFAIET SLYRLAASFG 
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651 LKPDYVLGHS VGEIAAAHVA GVLSLPDASA LVATRGRLMQ AVRAPGAMAA 
701 WQATADEAAE QLAGHERHVT VAAVNGPDSV WSGDRATVD ELTAAWRGRG 
751 RKAHHLKVSH AFHSPm©PI LDELRAVAAG LTFHBPVIPV VSNVTGELVT 
801 ATATGSGAGQ ADPEYWARHA REPVRFLSGV RGLCERGVTT FVELGPDAPL 
851 SAMARDCPPA PADRSRPRPA AIATCRRGRD EVATFLRSLA QAYVRQADVD 
901 FTRAYGATAT RRFPLPTYPF QRERHWPAAA GVGQQPETPE LPESSESSEQ 
951 AGHEREEGAR AWGGPEGRLA GLSVNDQERV LLGLVTKHVA WLGDASGTV 
1001 QAARTFKQLG FDSMAAAELS ERLGTETGLP LPATLTFDYP TPLAVAAHLR 
1051 AELTGTPAPA GSAPATGALG AGDLGTDEDP VAIVAMSCRY PGGAGTPEODL 
1101 WRLVADGADA IGDFPTDRGW DLA^FHPDP DRSGTSCTRQ GGFLYDAADP 
1151 DAEFFDISPR EALAVDPQQR LLLECAMBAF BRAGLDPRAL KGSPTGVFVG 
1201 MTGQDYGPRL HEPSQATDGY LLTGSTPSVA SGRLSFSFGL EGPALTVDTA 
1251 CSSSLVTLHL AAQALPRGEC DLALAGGATV LATPGMFTEF SRQRGLAPDG 
1301 RCKPFAAGftD GTGWAEGVGL VLLERLSEAR RKGHAVLAVI RGSAINQDGA 
1351 SNGLTAPNGP SQQRVIRAAL AAARLTAD^ DWEAHGTGT TLGDPIEAQA 
1401. LLATYGQGRS. AERPLWLGSV KSNIGHTQAA AGVAGVIKMV MAMRHDLLPA 
1451 TLHVDEPSGH VDWSTGAVRL LTBPWWPRG ERPRRAAVSS FGISGTNAHL 
1501 VI^EAGQDEY VAGAADDAGP VDGAVLPWW SGRTGAALRB QARRLRELVT ■ 
1551 GGSADVSVSG VGRSLVTTRA VFEHRAVWG RDRDTLIGGL EMAAGDASP 
. 1^01 DWCGVAGDV GPGPVLVFPG QGSQWVGMGA QLLGESAVFA ARIDACEQAL 
1-651 . SPYVDWSLTE VLRGDGRBLS RVDWQPVLW AVMVSLAAVW ADHGVTPAAV 
. 1701 VGHSQGEIAA VWAOALTLE DGAKIVALRS RALRQLSGGG AMASLGVGQE 
. 1751 QAAELVEGHP. GVGIAAVNGP. SSTVISGPPE QVAAWADAE ARELRGRVID 
1801 VDYASHSPQV DAITDELTHT LSGVRPTTAP VAFYSAVTGT RIDTAGUDTD 
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1851 YWVTNLRRPV RFADAVTALL ADGHRVFIEA SSHPVLTLGL QETFEEAGVD 
1901 AVTVPTLRRE DGGRARLARS LAQAFGAGCA VRWENWPPAT GTSTVELPTY 
1951 AFQRRRYWLE APTGTQDAAG LGLAAAGHPL LGAATEIADG DIRLLTGRIS ' 
. 2001 RHSHPWLAQH TLFGAAWPA SVIAEWALRA ADEAGCPRVD DLTLRTPLVL 
2051 PETAGVQVQI WGPADARDG HRDFHVYARP DGKDASEGEG IAE6EGASEG 
2101 EGASGGTDAP WTCHADORLV AEPTGTASED SPDTVWPPPG'=AEPVDLGDFY 
2151 ERAAATGVGY GPVFTGLRAL WRRDGELFAE AVLPQEAPBT AGFGMHPALL 
2201 DAALHPALLG ERPAEEDK^ LPFTLTGVTL WATGATSVRV RLTPLDDDPD 
22.51 ASADGRAWRV GVSDPTGAEV l^TCEALVAVA AGRRELRAAG ERVSDLYAVE 
2301 WVPVPGPGPV GBGADFSGWA GLGECGBRWE CVGRVERWYE DLDALGAAVE 
2351 GGASVPSWL ATAAAAPGGA GDGAADALSA VRWTGALLDQ WLADARFADA 
2401 RLWITSGAV ATGDDFLPDP AAAAVRGLVE QAQVRHPGRI LLVDTEAGAG 
2451 LGVGAGVDDA LLEQAVAMAL GADEPQLRLR AGRVLAPRLT APQDAAVTEA 
2501 ARPLDPDGTV LITGEIAGAPV- ADLAEHLVRT GQCRHLLLLP GDGBLEEMAE 

2551 ELRGLQATVD LSTADPADPT ALABWAAVE GDHPLTGVIH ATGWDAFDP 

.2601 GDSASDLMID SASDSFAEAW SSRAGVTAAL HTATAHLPLD LFAVLSPAGA 

2651 DLQIARSAAA AGADAFSAAL ALRRHTTVTT DTTAPPRTTA PPRTTASPRT 

2701 TALSSSRTTG VAIAYGPPTA PRPGIKGTAP GRIPVLLDAA RAHGG6SPLL 

. 2751 GARIAARMA AESAAEGVAG LPAPLRALAV AAAAAGAPTR RTAADRKPPA 

2801 DWPARLAPLS.APEQLRLLID AVRTHAAAVL GRTDPEALRG DATFKQLGLD. 

2851 SLTAVELRNR LVKDTGLRLP TALVFRYPTP AAIAAHLRER LTSPSBTTAT 

2901 QRSGGQTPAA GQASSALAPG 6SAAGPPAAD TVLSDI.TRME NTLSVLAAQl. 

2951 PHTETGEITT RLEALLTRWK TTNATANDSG DGNGGDDDAA ERLKAASADQ 

3001 IFDFIDNELG VGHGTSRVTP TPKAG* 
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MonAn, polyketide synthase multi-enzyme MONS2, housing extension 
module 2 Length: 2239 amino acids 

1 MASEEQLVEY LRRVTTELHD TRRRLVQEED RRQEPVALVG MACRFPGGVA 

51 SPEDLWDLVA AGKDAIEDPP TDRGWDLEAL YDPDPAAYGT SYVRHGGFVD 

101 DAGSFDADFF GISPREALAM DPQQRLMLET SWELFERAGI EPVSLKGSRT 

151 GVYAGVSSED YMSQLPRIPE GFEGHATTGS LTSVISGRVA YNYGLEGPAV 

201 TVDTACSASI. VAIHLASQAL RQRECDLALA GGVLVLSSPL MFTBFCRQRG 

251 LAPDGRCKPF AAAADGTGFS EGIGLLLLER LSDARRNGHK VLAVIRGSAV 

301 NQDGASNGLT APNDAAQEQV IRAALDNARL TPSEVDAVEA HGTGTKLGDP • 

351 lEAGALLATY GQHRARPLLL GSLKSNIGHT HATAGVAGVI KTVMAIRNGL 

401 LPATLHVEBL SPHVDWDAGA VEWTEPTPW PETGHPRRAG VSAFGISGTN 

451 AHLILEEAPP BEDVPAPWV ESGGWPWW SGRTPEALRE QARRLGEFVA 

• 501 GDTDALPNBV OWSLATTRSV FEHRAWVGR DRDALTAGLG ALAAGEASAG ,. 
551 • WAGVAGDVG PGPVLVFPGQ GAQWVGMGAQ LLDESAVFAA RIAECERALS 
601 AHVDWSLSAV LRGDGSBLSR .VBWQPVLWA VMVSLAAVWA DYGVTPAAVI- 

.651. GHSQGEMAAA CVAGALSLED AARIVAVRSD ALRQLQGHGD MASLSTGABQ 
701 AAELIGDRPG VWAAVN6PS STVISGPPEH VAAWADAEA RGLRARVIDV . 

• • 751 GYASHGPQIt) QLHDLLTERL ADIRPTNTDV AFYSTVTAER LTDTTALDTD 

801 YWVTNLRQPV RFADTIEALL ADGYRLFIEA SAHPVLGLGM EETIEQADMP ■ 
851 ATVVPTLRRD HGDTTQLTRA AAHAFTAGAD VDWRRWFPAD. PAPRTIDLPT 
•901 YAFQRRRYWL ADTVKRDSGW DPAGSGHAQL PTAVALADGG WLNGRVSAE 
951 RGGWLGGHW AGTVLVPGAA LVEWVLRAGD EAGCPSLEBL TLQAPLVLPB 
. 1001 SGGLQVQWV GAADEQGGRR DVHVYSRSEQ DASAVWQCHA VGBLGRASVA 
105^ RPVRQAGQWP PAGAEPVEVG GFYBGVAAAG YEYGPAFRSL RAMWRHGDDL 
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1101 LAEVELPEEA GSPAGFGIHP ALLDAALHPL lAQRSRDGAG AGAH6GQVLL 
1151 PFSWSGVSLW ASEATTWVR LTGLGGGDDE TVSLTVTDPA GGPWDVAEL 
1201 RLRSTSARQV RGSAGPGADG LYELRWTPLP BPLPVPAPAN GRDVAADLSG 
1251 CAVLGELVAE PGPGIDLEGC PCYPGVGALA DNASPPSMIL APVHSDTTGG 
1301 DGLALTERVL RVIQDFLAAP SLEQKQTRLA FVTRGAADTG^STTGGSAAPA 
1351 EAVDPAVAAV WGLVRSAQSE NPGR5VLLDT DAPLDQASVA ^LVDAVRSAV 
1401 EM)EPQVALR GGRLLVPRWA RAGEPVELAG PAGARAWRLV GGDSGTLEAV 
1451 VAEACDDIVL RPLAPGQV^ AVHTA6VNFR DVLIALGMYP DPDALPGTEA 
1501 A6WTEVGPG VTRLSVGDRV MGMMDGAFGP WAVADARMIA PVPPGWGTRQ 
1551 AAAAPAAFLT AWYGLVELAG LKAGERVLIH AATGGVGMAA VQIARHVGAE 
1601 VFATASPGKH AVLEEMGIDA AHRASSRDLA FEDAFRQATD GRGVDWLNS 
1651 LTGELimSL RLLGDGGRFV EMGKSDPRDP ELVALEHPGV SYEAFDLVAD 
1701 AGPERX.GLML DRLGELFAGG SLVPLPVTAW PLGRAR^R HMSQARHTGK 
1751 LVLDVPAPLD PD6TVLVTGG TGTIGAAVAE HLARTGBSKH LLIVSRSGPA 
1801 AHGAEELVSR lAEFGAE^TF VAADVSEPDA VAALIEGIDP AHPLTGWHA 
1851 AGVLDNALIG . SQTTESLTRV WAAKAAAAQQ LH^TRESRL GLFVMFSSFA 
1901 STMGTPGQAN YSAANAYCDA L&ALRRAEGL AGLSVAWGLW EATSGLTGTL 
1951 SAADRARIDR YGIRPTSAAR GCALLAA^ HGRPDLLAMD LDARVPAASD 
2001 APVPAVLRTL AAAGAPATAR PTAAAAADGA TDWSGRLAGL TEEARLELLT 
2051 ELVCTHAAGV LGHADAGAVO VDAPFKELOF DSLTAVELRN RIAAATGLKL 
2101 PAALVFDYPQ ARVLAAHLAE RLVPEGAGAM GGVSGAEGVR DAYGAGGPGG 
2151 DMTAQVLLEV ARVEITTLSAA VPHGIDRAAV AARL^LAR CTATTAATGA 
2201 AGAAVEGDGD SDGDGAVDQL ETATABQVLD FIDNELGV* 

MonAin, polyketide synthase multi-eazyme MONS3, housing extension 
modules 3 and 4 Length: 4133 amino acids 
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1. MVSEEKLVDY LKRVSADLHA TRQRLREAEE RGQEPVAWE AACRYPGGIR 
51 TPEDLWDLVA AGGNALGAFP DNRGWDLRRL FHPDPDHPGT TYAREGGFLH 
101 DADLPDPEFF GISPREAAVL DPQQRLLLEC AWEALERAGI DPRSLQGSRT 
151 GVYAGAALPG FGTPHIDPAA BGHLVTGSAP SVLSGRLAYT FGLEGPAVTI 
201 DTACSSSLVA VHLAAHALRQ RECDLALAGG VTVMTTPYVF TEFSRQRGIA 
251 ADGRCKPFAA AADGTAFSEG A6LLVLERLS DARRAGHRVL AVIRGSAVNQ 
301 DGASNGLTAP NGPAQQRVIR AALAGARLSP AEVDAVEAHG T6TRLGDPIE 
351 ADALLATYGQ ERHGGRPLWL GSVKSNIGHT QGAAGAAGLI KMVQALRHET 
401 LPATLYADEP TPHADWESGA VRLLSAPVAW PRGEHGEHTR RAGISSFGIS 
451 GTNAHLILEE APAADAEGAG GDGDGDGGGV RPWRVGATG PREEQGQGQG 
501 QEQHQQQRQQ RQRSSMMPTP HLPWLLSARS PAALRAQADA LANHVAHADH " 
551 SIADIGGTLL RRTLFEHRAV VLGTDRDERA AALAALAAGR AHPALTRAAG 
601 PARNGGTAFL FTGQGSQRPG MGRQLYDTFD VFAESLDETC ARLDPLLEQP 
651 LKPVLFAPAD TAQAAVLHGT GMTQAALFAL EVALYRQVTS • FGIAPSHLTG 
701 HSVGEIAAAH VAGVFSLADA GTLVAARGRL MQALPAGGAM LAVQAAEDDV 
751 LPLIAGQEER LSLAAVNGPT AWVSGEAAA VGBVBKALRG RGLKTKRLNV 
801 SHAFHSPLIE PMLDDFREVA RGLTFHAPTL PWSNLTGRL ADAEIiMADAE 
851 YWVRHVRRPV RFHDGLRALS EQGWRYLEL GPDPVLATMV QDGLPAPAEG ■ 
901 EEPEPWAAA LRSKHDEGRT LLGAVAALHT DGQPADLTAL FPADAGQVPL 
951 PTYRFQRRRY WRVAPDAAAP. MIAAGLQETG HPLLPAVIRQ ADGGILLAGR 
1001 LSLRTHPWLA DHTIAGGVPL PATAFVELAL LAGRHAACDT JDDLTLETPl., 
1051 LLDDTGTGVG AAVGAGADAL VDAIEVQLAL GAPDGSGRRA LTVHSRPADD 
1101 AADDGDAADA ADAAGRGGPG GSGDLGDPGD PGDI>GDGGGS RGWRRHATGI 
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1151 LSAGPAAEPA APDAAPWPPA DATALDVDAL YARLDAQGYS YGPAFRAVHA 
1201 AWRHGDDLYA DVRLADEQRA EADAFALHPA LLDAALHAVD ELYRGSEGRG 
1251 QEQGQGGQEP EQGRGDADAP VRLPFSFSDI RHHATGATRL WVRLSPQGDD 
1301 RLRLSLTDGE GGQVATVDAL QLRLIPADRW RAARPTTAAP LYHLDWHELP 
1351 LPEPAETDPA AHSWAVLGAH DAGLAPAAHY PDLAALKAAV^ EAGEPVPDIV 
1401 FAPFPAQGTE TDVPAQVRAH ARHALELLRD WLTTEAFAAA*TRLWLTTGAV 
1451 TARPEDGPAD LATAPWGLV RAAQAEQPDH WLVDIDKDI DKDTDEETDQ 
1501 ATDAGTASRH ALPAMAAiSk AQABTQLALR AGTVLVPRLA WPPRTDTPA 

1551, LHATAPESTT DTVDSTGIAG aaesggtvli tqgtgglgqa VARHLAAAHG 

1601 ARHLLLVSRR GDAAEGVAEL RADLADDGVD VRVAACDITD RDALAGLLAD 
1651 IPAAHPLTAV VHTAGVIDDS LITAMTPERL DAVLAPKADA AWHLHELTRD 

1701 KDLSAFVLFS sgasvlgngg qanyaaantf lntlaehrra aglaatsvaw 

. 1751 GLWESASGGM AARLODADRA RIHRTGVTGL TDBQALALPD AALTAEHPTV 
1801 LATRFDRAVL RGQAAARTLQ PALR<3LVRTP RPTASAGAIG STAATGSATD 

. 1851 enapsswaar larlsaadrd ralnelireq IATVLAHPSP DTIELGRAFQ 

1901- elgfdsltal elrnrlstat girlpatlvf dhpsptalvr hlhshlpdea.. 

•1951 qhtsptapga saegtaatat gidddpiaiv gmacrypggv tspeqlwqlv 

2001 atgtdaigpf pedrgwdtag lfdpdpdqvg hsytreggfl ydaarfdagf 

2051 fgispreaaa tdpqqrlllb tawqafehag idpaalrqtp cgvitgimyd 

2101 DYGSRFLARK PD6FEGRIMT GSTPSVASGR VAYTFGLEGP AITVDTACSS 

2151 SLVAMHLAAQ ALRQGECELA LAGGVTVMAT PNTFVEFSRQ RGLAPD6RCK 

2201 PFAAAADGTG WGEGAGLWL BRLSDARRKG HRVLALLRGS AVNQDQASNG 

2251 MTAPNGPSQE RVIRTALAGA GRGPEDIDW EAHGTGTTLG DPIEAQALLA 

•2301 TYX3QGRPEDR PLWLGSVKSN WHTQAAAGV AGViroWMAL' RHEQLPTTLH 
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2351 ADEPTPHVQW DG6GVRLLTE PVPWSRGERT RRAGVSSFGI SGTNAHLILE 
2401 EPPEEDLPEP VAAEPGGWP WWSGRTPDA LREQARRLGE FWGAGDVSA 
2451. AEVGWSLATT RSVFEHRAW AGRDRDDLVA GMQALAAGET PTDWSGAAA 
2501 SSGAGPVLVF PGQGSQWVGM GAQLLDESPV FAARIAECEQ ALSAYVDWSL 
2551 SDVLRGDGSE LSRVEWQPV LWAVMVSLAA VWADYGVTPA AWGHSQGEM 
2601 AAACVAGALS LEDAARIVAV RSDALRQLQG HGDMASLGTG AEQAAELIGD 
2651 RPGWVAAVN GPSSTVISGP PEHVAAWAB AEARGLRARV IDVGYASHGP 
2701 QIDQLHDLLT EGLADIRPAN TDVAFYSTVT AERLTDTTAL DTDYWVTNLR 
2751 QPVRFADTIE ALLADGYRLF lEASAHPVLG LGMEBTIEQA DIPATWPTL 
2801 RRDHGDTTQL TRAAAHAFTA GADVDWRRWF PADPTPRTVD LPTYAFQHQH 
2851 YWLEEPSGLT. GDAADLGMVA AGHPLLGACV ELABSDSYLF TGRLSRRAPS 
2901 WLAEHWAGT VLVPGAALVE WVLRAGDEAG CPTIEELTLQ APLVLPESGG 
2951 LQVQWVGAT DEQSGRRDVH VYSRSEQDAS AVWVCHAVGV VSSEMPEAAA 
3001 ELSGQWPPAG AEAVDVEDFY ARAAEAGYAY GPAFQGLRAL WRHGTELPAE 
3051 WLPEQAGGH DGFGIHPALL DAALHPLMLL DRPADGQMWL PFAWSGVSLN 
3101 ADRATHVRVR LSPRGEAAER DLRWIADAT GAPVLTVDAL TLRAADPGRL 
•3151 GAAARGGVDG LYTVDWTPLP LPQPLPLPRT DAGGSADWVI LSDNSSAALA 
3201 DAVSSATAAG GGAPWALLAP VGGGSADDGL PWRRTLSLV QEFLAAPELT 
3251 BSRLVIVTRQ AVATDADGDV AASAAAVWGL IRSAQSENPG RFVLLDVBEB 
.3301 HLHPDGGELP YAALRHAVEB LDEPQLALRS GKFLVPRMTP AAAPEELVPP 
3351 VGTSGWRIiGT SGTATLENLS VIDAPEAFAP LBPGQVRISV RAAGMNFRDV 
3401 LIALGMYPDK QTFAGSEGAG HVTEVGPGVT HLSVGDRVMG LFEGAFAPLA 
.3451 VADARMWPI PEGWSFQEAA AVPWFLTAW YQLVDLQRLR AQESLLIHAG 
3501 TQGVGMAATQ lARHLGAEVF ATAS^^IOIGV LDGMGIDAAH .R^SSEDLDFE; 
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3551 ETLRAATGGR GMDWLNSIA GEFTDASLRL LAEC3GRMVDM GKTDKRDPDR 

. 3601. VAAEHAGAWY RAFDLVPHAG PDRIGEMLAE LGELFASGAL APLPVQTWPL 

3651 GRAREAFRFM SQAKHTGKLV LEIPPALDPD GTVLITGGTG VLAAAVAEHL 

3701 VREWGVRHLL LAGRRGSEAP GSSELAEELT ELGABVTFAA ADVSDPDAVA 

3751 BLV6KTDPAH PLTGVIHAAG VLDDAWTAQ TPESLARVWA AKATAAHLLH 

3801 EATREARLGL FLVFSSAAAT LGSPGQANYA AANAYCDALv'^QRRAEGLAG 

■3851 LSIGWGLWQT ASGMTGHLGE TDLARMKRTG FTPLTTEGGL ALLDAARAHG 

3901 RPHWAVDLD ARAVAAQP^ SRPALLRALA AGATPGARTA RRTAAAGSVA 

3951 PAGGLADRLA GLPHPERRilL LLDLVRGNVA GVLGHSDHDA VRPDTSFKBL. 

4001 GPDSLTAVEL RNRLAAATGL KLPAALVFDY PESATLVDHL LERLSPDGAP 

..4051 PPVKDAADPV LNDLGRIESS LDALALDADA RSRVTRRLNT LLSKLNGAAT 

4101 AGSPADVTDL DALDALDDVS DDEMFEFIDR EL* 

MonAIV, polyketide synthase multi-enzyme M0NS4, housing extension 
modules 5 and 6 Length: 4039 amino acids 

1 MSSAEESSPD VSGTGVSGTG ESATGTSSTE AKLRQYLKRV TVDLGQARRR 

51 LREVEERAQE PIAIVSMACR FPGDTRTPEA LVTOLVAEGGD AIDDFPTNRG 

101 WDLESLYHPD PDHPGTSYVR RGGFLYDAPA- FDASFFGISP REALAMDPQQ 

•151 RVLMBTAWQL LBRAGIDPAS LKLSATGVYI GAGVLQFGGA QPDKTVEGHL 

201 LTGSALSVLS GRISFTLGLE GPSVSVDTAC SSSLVSMHLA AQALRQGECD 

251 LALAGGVTVM STPGAFTEFS • RQGALSPDGR SKAFAASADG TGFSEGAGLL 

301 LLERLSDAfeR NGHKVLAVIR GSAVNQDGAS NGLTAPNGPS QERVIRAALA 

351 NAGLGAAEVD AVEAHGTGTK LGDPIEAGAL LATYGRDRDE DRPLWLGSVK 

401 SNIGHPQGAA GVAGVIKMVM ALQRELLPAT LYVDBPTPHV DWSSGSVRLL 

451 TEPVPWTRGE RPRRAGVSAF. GMSGTNAHVI LEEAPPEEAA AAETPAEGTG. 

. 501 AWPWWSGR GEEALRAQAA QLAEHVRDDD <3RPASPLBVG WSIATTRSVF 
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551 ENRAWVGDD RDALLDGLRS LAAGEASPDV VSQAVQPTQP GPVMVFPGQG 
601 GQWVGMQARL UDESPVFAAR lAECEQALSA YVDWSLTDVL RGDGSELMII 
651 DWQPVLWAV MVALAAVWAD QGIEP2^WG HSQGEIAAAC WGMSLDEA 
701 ARIVAVRSVL LRQLSGRGGM ASLGMGQEQA ADLIDGHPGV WAAVNGPSS 
751 TVISGPPEGI AAWADAQER GLRARAVASD VAGHGPQLDA ILDQLTEGLA 
801 GIRPAATDVA FYSTVTAGHL TDTTELDTAY WVRNVRRTVR PADTIDALLA 
851 DGYRLPIEVS PHPVIiNLALE GLIERAAVPA TWPTLRRDH GDTTQLARAA 
901 AHAFAAGADV DWRRWFPADP APRTVDLPTY AFQRQDFWPA PAGQRSGDPA 
951 GLGIAASGHP LL6ASVGLAS GDVHLLSGRV ' SRQSAAWLDD HWAGQALVP 
1001 GAAQVBWVLR AGDDAGCSAL EELTLQTPLV LPDTGGLRIQ WVBAADAHG 
1051 RRDVRLFSRP DDDDAFASTH PWTCHAT6VL APAPTDGtNG TRDAADTLDG 
1101 AWPPADAEPV PADDLYAQAD RTQYGYGPAP RGVRALWRHG KDVLABVTLP 
1151 KEAGDPDGFG IHPALLDAVL QPAALLLPPT DAEQVWLPFA WNDVALHAVR 
1201 ATTVRVRLTP LGERIDQGLR ITVADAVGAP VLTVRDLRSR PTDTGRLAAA 
1251 ATRDRHGLFD LEWIAPENAA ENAAGPARDA SEGWVTLGED AASIADLIAS • 
1301 VEAGAPAPQL VAAPVEPDRT DDGLALATHV LDLVQTWLAS ■ PLHDSRLVLV 
1351 TRGAVTDADV DVAAAAVWGL VRSAQSEHPO RFTLIDLGPD DTIiAAAMQAA 
1401 HLBBPQLAVH GGEIRVPRLV RATTDPtAPN GTPEADRTAD PSEGLHRNGT • 
1451 VLITGGTGVL GRLVAEHLVT EWGVRHLLLA SRRGDQAPGS ABLRARLSEL 
1501 ■ GASVEIAPAD VGDAEAVAAL IASVDPAHPL TGVIHAAGVL DDAVITAQTP 
1551 ESIARVWATK ATAARHLHEA TRETPLDFFV VFSSAAASLG SPGQANYAAA 
1601 NAYCDALVQH RRAQGIAGLS IAWGLWQATS GMTGQLSETD LARMKRTGFA 
1651 ALTDEGGLAL LDAARAHDRA YWAADLDPR AVTDGLSPLL RALTAPATliR' 

1701 RVASEGLADG alatrlagld adgrlrlltd wrbyvaavl ghgsaarvgv 
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1751 DIAFKCLGFD SLTAVBLRNR LSAACDVRLP ATLIFDHPTP QAIATHLVDR 
1801 lAGSTSATTT VNATAPAAAH VAAGADVDAD TDDPVAIVAM TCRFPGGVAS 
1851 PDDLWDLLDA RKDAMGAFPT DRGWDLERLF HPDPDHPGTS YTDQGGFLPD 
1901 AGDFDAAFFG INPREALAMD PQQRLLLEAS WEVLERAGID PTTLKGTPTG 
1951 TYVQLMYHDY AKSFPTADAQ LEGYSYLASJ GSMVSGRVAY TLGLBGPAVT 
2001 VDTACSSSLV SIHLATQALR HQECDLALAG GVTVMADPDM ^AGFSRQRGL 
2051 SPDGRCKAYA AAADGVGFSE GVGVLLLERL SDARRHGRRV LGWRGSAVN 
2101 QDGASNGLTA PNGPSQERITi RQALASGGLS SVDVDWESH GTGTTLGDPI 
2151 EAQALLATYG QGRPEDRPLW .LGSVKSMIGH TQAAAGVAGV IKMVMAMRHG 
2201 WPASLHVDV PSPHVEWDSG AVRLAVESVP WPQVEGRPRR AGVSSFGASG 
2251 TNAHVIVBSV PDGLEEDSVS VGGEALBTET DGRLVPWWS ARSPQALRDQ 
2301 ALRLRDFASD ASFRAPLADV GWSLLKTRAL HEHRAWVGA ERABLIAALB 
2351 ALATGEPHAA LVGPACSQAR VGGDDWWLF SGQGSQLVGM GAGLYERFPV . 
2401 FAAAFDEVCG LLEQPLGVEA GQLRBWFRG PRERLDHTVW AQAGLFALQV 
2451 GLARLWESVQ. VRPDWLGHS IGEIAAAHVA GVFDLADACR WGARARLMG 
2501 GLPEGGAMCA VQATPAELAA DVDGSAVSVA' AVNTPPSTVI SGPSDEVDRI 
. 2551. AGVWRBRGRK TKALSVSHAF HSALMEPMLA EFTEAIRGVK FRQPSIPLMS 
2601 NVSGERAGEE ITDPEYWARH VRNAVLFQPA lAQVADSAGV FVELGPAPVL 
• 2651 TTAAQHTLDB SDSQESVLVA SLAGERPEES AFVEAMARLH TA6VAVDWSV 
2701 LFAGDRVPGL VELPTYAFQR ERFWLS6RSG GGDAATLGLV AAGHPLLGAA. 
2751 VBFADRGGCL LTGRLSRSGV SWLADHWAG AVLVPGAALV EWALRAGDEV 
2801 GCVTVEELML QAPLWPBAS GLRVQVWEE " AGEDGRRGVQ I.YSRPDADAV 
2851 GGDDSWICHA TGVLSPESAR LDTEIiGGVWP PAGAEPUDVD GPYAQAGEAG 
2901 YGYGPAPRGL RAVWRHGQDL LAEWLPEAA GAHDGYGIHP ALIDATLHPL 



-84- 



SUBSTITUTE SHEET (RULE 26) 



wo 01/68867 PCT/GBOO/02072 

2951 LiAARFMDGSE DDQLYVPFGW AGVSLRAVGA TTVRVRLRPV GESVDQGLSV 

3001 TVTDATGGPV LSVDSLQTRP VKPSQLAAAQ QPDVRGLFTV EWTPLPQTDA 

3051 DGEADWWLS DGVGRIjADW SAAGGEAPWA WAPVDASVG DGREGLDGRL 

3101 WERVLSLVQ EFLALPELAE SRLLWTRGA VATGVDGDGD VDASAAAVWG 

3151 LVRSAQSENP . GRFILLDVDG DGDDQGPDLN GRHLPHATLR HAAEELDEPQ 

3201 LALREGTLYV PRLTQARQSA ELWPPGEPA WRLRMVHD6S LDALAAVACP 

3251 EALEPLAPGQ VRIAVHAAGI NFRDVLVALG MVPAYGAMGG EGAGWTEVG 

3301 PEVTHVSVGD RVMGVFEGAF GPWIAEARM VTPVPQGWDM REAAGIPAAF 

3351 LTAWYGLVEL AGLKAGERVL VHAATGGVGM AAVQIARHVG AEVFATASPG 

3401 KHAVLEEMGI DAAHRASSRD LAFEGTFREA TGGRGMDWL NSLAGEFIDA 

3451 SLRLLGDGGR FLEMGKTDVR AAEEVAAEHA DVSYTAYDLV GDAGPDRISN 

3501 MLDKLVELFA SERLKPLPVR SWPUDKAQEA FRFMSQAKHT GKLVLEIPPA 

3551 LDPEGTVLVT GGTGALGQW AEHLVREWGV RHLLLASRRG PEAPGSDELA. 

3601 SRLTGLGAEV TIVAADVSDP ASWBLVGKT DPSHPLTGW HAAGVLEDGV 

3651 VTAQTPEGLA RVWAAKAAAA ANLHEATREM RLGLFWFSS AAATLGSPGQ 

3701 ANYAAANAYC DALMQHRRAV GQVGLSVGWG LWEAPDAKPG VAADAKASAA 

3751 TVGKASALSD GTNGSAPQDT TGTAPQGMTG GLTDTDVARM ARIGVKGMSlil 

3801 AHGLALFDAA HRHGRPHLVG FNLDLRTLAT HPLHTRPALL RGLATPTAGG 

3851 ASRPTATAGG QPADLAGRIiA ALSPSDRHHT LVRLIREQAA TVLGHHPDSL 
3901 TTGSTFKELG FDSLTAVEIiR JJRLSAATGLR LPAGLVFDHP DADILABHLG 

3951 AQLAPDGDTP AGAEATDPVL RDLAKLENAL SSTLVEHLDA DAVTARliEAL 
4001 LSNWKAASl^ PGSGSTKEQL QVATTDQVLD FIDKBLGV* . 

MoiiAV, polyketide synthase multi-enzyme MONS5, housing extension 
modules 7 and 8 Length: 4107 amino, acids 
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1 MASEEELVDY LKRVAAELHD TRQRLREVED RRQEPVAWG MACRFPGGIE 

51 TPEGLWELVA AGDDAIEPFP TDRGWDLEGI YHPDPDHPGT CYVREGGFLA 

101 APDRPDSDFF GPSPREALAS SPQLRLLLET SWEALERAGI NPASLKGSPT 

151 GVYVGAATTG NQTQGDPGGK ATE6YAGTAP SVLSQRLSFT LGLEGPAVTV 

201 ETACSSSLVA MHLAANALRQ GECDLALAGG VTVMSTPEVE TGFSRQRGLA 

251 PDGRCKPFAA AADGTGWGBG AGLILLERLS DARRKGHKVL IviRGSAINQ 

301 DGASNGFTAP NGPSQRRVIR QALSSAHLST SEIDWEAHG TGTRLGDPIE 

351 AEALIATYGK EREDDRPLWITgSVKSNIGHT QAAAGVAGVI KMVMALQRBL 

401 lpatlnvdep tphvqweggg vrlltepvpw srgbrprrag issfgisgtn . 
451 ahwleeapp eedvpgpvaa epegyvpww sarteealse qarrlgefva 

501 DTDPSTADVG WSLTTSRAIL EHRAVWQRD RDALTAQLAA LAAGEBSADV 
551 VAGVAGDVGP GPVLVFPGQG SQWVGMQAQL LDESPVFAAR lAECEQALSA 
601 YVDWSLSAVL RGDGSELSRV EWQPVLWAV MVSLAAVWAD YGVTPAAVIG 
651 . HSQGEMAAAC .VAGALSLEDA.ARWAVRSDA LRQLMGQGDM ASLGASSEQA ■ 
701 AELIGDRPGV CIAAVNGPSS TVISGPPEHV AAWADAEER GLRARVIDVG • 
751 YASHGPQIDQ LHDLLTORLA DIRPAXTDVA FYSTVTABRL TDTTALDTDY 
801. WVTNLRQPVR PADTIDALLA D6YRLFIEAS -AHPYLGLGME ETIEQADIPA 
. 851 TWPTLRRDH GDTTQLTRAA AHAFTAGATV DWRRWFPADP TPRTIDLPTY 
901 AFQRRSYWLP VDGVGDVRSA GLRRVEHSLL PAALGLADGA LVLTGRLAAS 
951 GGGGGWLADH AVAGTTLVPQ AALVEWALRA ADBAGCPSLE ELXLQAPLVL 
1001 PGSGGLQVQV WGPADGQGG RREVRVFSRV DSDDEAAGQD BGWSCHATGV 
1051 LSPEPGAVPD GLSGQWPPTG AEPLEISDLY EQAASAGYEY GPSFRGLRSV 
1101 . WRHGHNLLAE. VELPEQAGAH DDFGIHPVLL DAALHPALLL DONAPGEEQE' 
.1151 PAQPAI^RLPF VWNGVSLWAT GAATVRVRLA PH<3GGETDDS AGLRVTVADA 
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1201 TGAPVLSVDS LALRPADPEL LRTAGRAGSG TNGLFTVEWT ALPPADVADH 

1251 AAGDGWAVLG QDVPDWAGAD MPRHPDMASL SAALDEGTQA PAAVFVETTA 

1301 TSHATPNTAA DVTLDASGRA VABRTLHLLR DWLAEPRLAE TRLVLITHHA 

1351 VTTPADDDW AAPLDVPAAA LWGLIRSAQA EHPDRPVLLD TDAKANTDPG . 

1401 PDTSTDHSTA SGTYRTVIAR ALATGEPQLA VRAGELLAPR LARAATPTPE 

1451 TPTPETQPDT GSGSEAGAGS 6SGPGATLDP DGTVLIAGGT GMMGGLVAEH 

1501 LVRAWSVRHL LLVSRQGPDA PDARDLADRli VGLGATVRIV AADLTDGRAT 

1551 ADLVASVDPA HPLTGVIHAA GVLDDAWTA QTSDQLARVW AAKASVAANL 

1601 DAATSELPLG LFLMFSSAAG VLGNAGQAGY AAAIJAPVDAL VGRRRATGLP 

1651 GLSIAWGLWA RGSAMTRHLD DADLARLRAG GVKPLLDEQG LALLDAARAT 

1701 AAHTSLWAA GIDVRGLNRD DVPAILRDLA GRTRRRAAAD STVDQAALBR 

1751 RLTGIjDEAER RAWTDWRE CVAAVLGHRS aadvrteanf KDLGFDSLTA 

1801 vqlrnrlsaa sglrlpatla fdhptpqala aylgtrlsgr tatpvapvae 
1851 saaatdepva ivamackypg gatspbglwd lvaegvdavg afptgrgwdl 

1901 BRLFHPDPDH PGTSYADEGA ■ FLPDAGDFDA AFFGINPRBA LAMDPQQRLL- 

1951 LEASWEVLER AGIDPTTLKG TPTGTYVGVM YHDYAAGLA^J DAQLEGYSML 

2001 AGSGSWSGR VAYTLGLEGP AVTVDTACSS SLVSIHLAAQ ALRQGEGTLA 

2051 LAGGVTVMAT PEVFTGFSRQ RGLAPDGRCK PFAAAADGTG WGEGVGVLLL 

2101 ERLSDARRHG RRVLGWRGS AVNQDGASNG LTAPNGPSQE RVIRQALAgG 

2151 GLSSVDVDW EGHGTGTTLG DPIEAQALLA TYGQGRPVDR PLl^tiGSVKSN 

2201 IGHTQAAAGV AGVIKMVMAM RHGWPASLH VDVPSPHVEW DSGAVRIAVE 

2251 SVPWPEVBGR PRRAGVSSFG ASGTNAHVIV ESVPDGLGED SVSVSQBAPE 

2301 TETDGRLVPW WSARSPQAL RDQALRLRDA VAADSTVSVQ DVGWSLLKTR 
2351 . ALFEQRAWy GRERAELLSG lAVLAAGEEH PAVTRSREDG VAASGAVVWL 
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2401 FSGQGSQLVG MGAGLYERFP VFAAAFDEVC GLLEGPLGVE AGGLRBWFR 
2451 GPRERLDHTM WAQAGLFALQ VGIARLWESV GVRPDWLGH SIGEIAAAHV 
2501 AGVFDLADACRWGARARLMGGLPEGGAMCAVQATPAELAADVDDSGVSV 

2551 AAVNTPDSTV ISGPSGEVDR lAGWRERGR KTKALSVSHA FHSALMEPML 

2601 AEFTEAIREV KFTRPKVSLI SNVSGLBAGE EIASPEYWAR^HVRQTVLFQP 

2651 GIAQVASTAG VP^ELGPGPV LTTAAQHTLD DVTORHGPEP^VLVSSLAGER 

2701 PEESAFVEAM ARLHTAGVAV DWSVLFAGDR VPGLVELPTY aforerfwls 

2751 GRSGGGDAAT LGLVAAGHpl. LGAAVEFADR GGCLLTGRLS RSGVSWLADH 

2801 WAQAVLVPG AALVBWALRA GDEVGCVTVE ELMLQAPLW PEASGLRVQV 

2851 WEEAGEDGR RGVQIYSRPD adavsgddsw ichatgtltp qhtdapndgl 

2901 agawpaagav pvdlagfyer vadagyaygp gfqglravwr hgqdllaew 
lpeaagahdg ygihpallda tlhpallldw pgevqdddgk vwlpftwnqv 
slraagaatv rvrlspgehd eaerevqvlv adatgtdvls vgsvtlrpad 
irqlqavpgh ddglfsvdwt plplsrtdvs qtdadgdadw wlsdgvgsl 

ADWSAAGGE APWAWAPVG ASAGGGLAGF DRREGLDGRL WERVLSLYQ 

eflaapblae srllvltrga vatggdgdgd vdasaaavwg lvrsaqsenp 

GRFILLDVDM DVDVDVDMDV DVDVDVDVDV DGDGNGSDLD PDLNGRRLPH " 

atlrhaaeel depqlalrdg qllvprlvra tggglwapt drawrldkgs 
aetlesvapv aypgvmeplg pgqvrlgiha aginfrdvlv slgmvpgqvg 
lggegagwt etgpdvthls .vgdrvmgvlh gsfgptavad trmvapvpqg 
wdmrqaaamp vayltawygl velaglkage.rvlihaatgg vgmaavqiar 
hlgaevfata saakhwlee mgidaahras srdlafedtf rqatdgrgmd 

WLNSLTGEF IDASiaiLLGD GGRFLEMGKT DVRTPEEVAA BYPGVTYTVY 
DLVTDAOPDR lAVMMSELGE RFASGALDPL PVRSWPLDKA REAFRFMSQA 



2951 
3001 
■3051 
3101 
3151 
3201 
• 3251 
3301 
3351 
3401 
3451 
3501 
3551 
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3601 KHTGKLVLDV PAPLDPDGTV LITGGTGALG QWAEHLVRB WGVRHLLLAS 

3651 RRGLDAPGSG ELADRLSDLG AEVTVAAADV SDPASWELV GKTDPSHPLT 

3701 GWHAAGVLE DGIVTAQTPB GLARVWAAKA AAAANLHEAT REMRLGLFW 

3751 FSSAAATLGS PGQANYAAAN AYC33AIiMQRR RAAGQVGLSV GWGLWEAPDA 

3801 KPGVAADAKP DVAADAKTGV AADGTPQGMT GTLSGTDVAR MARIGVKAMT 

3851 SAHGLALLDA AHRHGRPHLV AVDLDTRVLA HKPAPALPAL LRAFAGDQ6G 

3901 QGGGRGGGRG GGPARPAAAT TRQNVDWAAK LSVLTAEEQH RTLLDLVRTH 

3951 AAAVLGHAGT DAVPADAAFQ DLGFDSLTAV BLRNRLSAST GLRLPATFIF 

4001 RHPTPSAIAD ELRAQLAPAG ADPAAPLFGE LDKLETVITG HAHDESTRTR 

4051 LAARLQNLLW • RLDDTSARSD HAAGASDADG DAVENRDLES ASDDELFELI 

4101 DRELPS* 

MonAVI, polyketide synthase multi-enzyme MONS6, housing extension 
module 9 Length: 1701 amino acids 

1 MPGTNDMPGT EDKLRHYLKR. VTADLGQTRQ RLRDVEERQR EPIAIVAMAC 

51 RYPGGVASPE QLWDLYASRG DAIEEFPADR GWDVAGLYHP DPDHPGTTYV 

101 REAGFLRDAA. RFDADFFGIN PRBALAADPQ QRVLLEVSWE LFERAGIDPA 
151 • TLKDTLTGVY AGVSSQDHMS GSRVPPEVEG YATTGTLSSV ISGRIAYTFG 

201 LEGPAVTLDT ACSASLVAIH LACQALRQGD CGLAVAGGVT VLSTPTAFVE 

251 FSRQRGLAPD GRCKPFAEAA DGTGFSEGVG LILLERLSDA RRNGHQVLGV 

301 VRGSAVNQDG ASNGLTAPND VAQERVIRQA LTNARVTPDA VDAVEAHGTG 

351 TTLGDPIEGN ALLATYGKDR PADRPLWLGS VKSNIGHTQA AAGVAGVIKM 

401 VMAMRHGELP ASLHIDRPTP HVDWEGGGVR LLTDPVPWPR ADRPRRAOVS 

451 SFGISGTNAH LIVEQAPAPP OTADDAPEGA ATPGASDGLV VPWWSARSP 

501 QALRDQALRL RDFAGDASRA PLTDVGWSLL RSRALEEQRA WAGRERAEL 

551 LAGLAALAAG EEHPAVTRSR EEAAVAASGD WWLFSGQGS QLVGMGAGLY 
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601 ERFPVFAAAF DEVCGLLEGE LGVGSGGLRE WFWGPRERL DHTVWAQAGL 

651 FALQVGIiARL WESVGVRPDV VLGHSIGEIA AAHVAGVFDL ADACRWGAR 

701 ARLMGGLPEG GAMCAVQATP AELAADVDGS SVSVAAVNTP DSTVISGPSG 

.751 EVDRIAGWR ERQRKTKALS VSHAFHSALM BPMLGBFTEA IRGVKFRQPS 

801 IPLMSNVSGE RAGEEITSPE YWARHVRQTV LFQPGVAQVA AEARAPVELG 

851 PGPVLTAAAQ HTLDHITEPE GPBPWTASL HPDRPDDVAF ^HAMADLHVA 

901 GISVDWSAYF PDDPAPRTVD LPTYAFQGRR FWLADIAAPE AVSSTDGEEA 

951 GFWAAVEQAD FQALCDTLHL'^KDDEHRAALE TVFPALSAWR RERRERSIVD 

1001 AWRYRVDWRR VELPTPVPGA gtgpdadtgl gawlivapth gsgtwpqaca 
1051 RALEEAGAPV RIVEAGPHAD RADMADLVQA wrascaddtt qlggvlslla 

1101 laeapatssd ttshtstscg tgslashglt gtltllhgll dagveaplwc 
1151 atrgavscgd adplvspsqa pvwglgrvaa lehpelwggl vdlpadpesl 

1201 DASALYAVLR GDGGEDQVAL RR6AVLGRRL VPDATPDVAP GSSPDVSGGA 

1251 AHADATSGEW QPHGAVLVTG GVGHLADQW RWIAASGAEH WLLDTGPAN 

1301 SRGPGRNDDL AAEAAEHGTE LTVLRSLSEL TDVSVRPIRT VIHTSLPGEL 

1351 APLAEVTPDA LGAAVSAAAR LSELP6IGSV BTVLFFSSVT ASLGSREHGA 

1401 YAAANAYLDA LAQRAGADAA SPRTVSVGWG IWDLPDDGDV ARGAAGLSRR 

1451 QGLPPLEPQL ALGALRAALD <3GKGHTLVAD IBWERFAPLF TLARPTRLLD 

1501 GIPAAQRVLD ASSESAEASE NASALRRELT ALPVRERTGA LLDLVRKQVA 

1551 AVLRYEPGQD VAPEKAFKDL GFDSLVWEL RNRLRAATGL RLPATLVYDY 

1601 PTPRTLAAHL LDRVLPDGGA AELPVAAHLD DLEAALTDLP ADDPRRKGLV 

1651 RRLQTLLWKQ PDAMGAAGPA DEEBQAAPED LSTASADDMF ALIDREWGTR 

1701 * 

MonH, probable regulatory protein Length: 981 amino acids 
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1 VSGVERGVGS AGPVEQGDGL AGLVERAEAL AALRGAPDGS PGTGGSLWL 
51 SGAVGTGKTA LLRAWADRIG ADADALVLTA TACRAERDLP LGVLEQLVRS 
101 PGLPPASAER ALAWWDBBAS ATPGKTOANG TSAUGTDANG TGAGQTGAGQ 
151 AGVGQTGVGG EPVLAASALR GLCEVLRDLL AER-PVWAVD DAHHADAASL ' 
201 QCLLSWRRL RSARLHVLFT EYAHQKAQNA LLSSEFLHEP ALRRIRLEPL 
251 SKAGV^LA RHLDERTAQP LTPWHGMSA GHPLLVRALA EDHRAAGGAG 
301 E^YGRAVLSF LYRHETPVTQ . VARAIAALGA HAGPGQVGRL LDVDAASVER. 
351 AVRQLTVAEV LHEGRLCHPr FAAAVLDGMP PEE5RALHGR VADLLHEEGA 
401 PATEVAAHLV AADRSDAPWA VPVFQEAAQL ALDEDQVETG VDYLRAAHQR 
451 CRGAAQRAAV VGALADAEWR LDPAKVLRHL PDPAAMAPQT DPAALAPHTD 
501 PAPTAAPTAA PTPTPIPT.TP PLPTHLLWHG RVEEGLDAIG TLTGPGPNPA 
551 GAPPMNPADL DTPWLWGAYL YPGHVKERLG SGALSPQRST PPAVTPELQG 
601 AGTLMNDLLH GGERDAT^ BRALNRYRLG PRTIAVQTAA LAALTYRDRP 
651 HRAAAWCDGL VAQADERNSP TWRALFTAWR ALLHLRQGDP AAAEQRAETA 
701 LALLGSKGWG AAIGLPLAAA VQAKAALGDV DGAAALLERP VPQAVFQTRT 
751 GLHYLAARGR YHLATGCHYA ALCDFYACGT RMSSWGVDLP ALEPWRLGAA 
801 EAYLALGEGL LARQLVDGQL PLPTPDDGRT WGMTLRLRAA TSPAPARAEL 

851 LDEAVAVLRE SGDTFELARA VADQAVAVRE GGBAERARLL ARKAELLARR 
901 WGSAPAPATV PEPPERPGPA TPDAELTSAE RRVAELAAEG FTNREISRKL 
951 CVTVSTVEQH LTRIYRKLDV RRLDLQAALG * 

MonCI, flavin-dependent epoxidase Length: 496 amino acids 

1 VTTTRPAHAV VLGASMAGTL AAHVLARHVD AVTWERDAL PEEPQHRKGV 

51 . PQARHAHLLW SNGARLIEEM LPGTTDRLLA AGARRLGFPE DLVTLTGQGW 
101 <iimFPATQFA LVASRPLLDL TVRQQALGAD NITVRQRT^ VELTGSGGGS 
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151 OORVTGWW BLDSGRQEQ^ E^KVID^TG RGSHLKQWIA 
301 WB;.™ RLFmPG^^T IHmVNIA. DDRVKEP6R. OWVPIEGGR 
251 WU.TLSCTRG AQLPTHEDEP IPFABHUB^ IM>I,L^ PI-TPVPCSRS 
.01 .aSRHf^PER LKOWPDOI^V IGDS.TA^P XVGHGHSS^ .RCATTIDREF 
3S1 ER.VQEGTGS iU«^K MGMVDPPW II^TKDIDY VKCRVSATDP 
.01 RLIGVDTEQR I^Fi^ITA^ SIRSPK^EI VTBVMSI^ O^'^^^^ 
451 «ERI^ ELTAPPPLPE BLAWGI^ TISPTPTPTP TMVRS 

MonBH, earbon-carto. double bond isomerase Length: 141 acid, 

1 mpde;^ avd.»eri«^ gdiegvidlf tddivfedfv grppmvgkdb 

51 LRRHI^VS OGTHEVPDPP »PBMDDEFW TPnVTVQRP RFMTFRIVGI 
101 VEU.EHGLGR RVQRFWGVTD VWDDPAGPA DTTHPEGIRA ♦ 

M.OBI, carbon-carbon double bond Isomer.^ Length: 144 andn. add, 

1 ^^^^ "'^"-^^^ ^"^^ 

51 RM«EP^ H.REE«EPV AGQDATH.LI «ISSV»..P ^^^^^ 
101 I,K«mPGTA RIHRTAMLVI R«DASGLIRH LKSMGTSDL TVI^- 

MonAVm, polyUetide synthase n.um.«»yme M0NS8. housing extension 
modules U and 12 Length: 3754 ammo aads 

1 MS^EEKI^H I^AEIRO ^^^'^ ^^'^^^^^ 

51 BDI.«ELVRDG GDAVAG^D RGWDI^X.™ P^^^^*^^'' 

101 GHFDAEFFGI SPREATA»P QQRt^^ETAH EAIEHAGM«P HAI^SDIGV ■ 

151 FTGVSAHD.^ T.ISQTAEOV EGYIGTGHI^ SWSGRIS^ VG^OPAVTV 

201 DTACS8S.VA IHIASQAI^Q GEOSI^ STV«AXPOSP TEPSRQRGIA 

251 POGRCKPFAA AADGTGWGEG AGWAX.-S BARRRGHKV. a™x«Q 

301 DSTS^GI^ NGPCQERVIR AALAHARI^ EDIDAVEAHO TGTTI^^E 
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351 AQALIATYGQ GRPEDRPLWL GSVKSNIGHT QAAAGVAGVI KMVMAMRNGL 
401 LPTSLHIDAP SPHYQWEQGS VRLLSEPVDW PAERTRRAGI SAFGISGTNA 
451 HLILEEAPPE EDAPGPVAAE PGGWPWWS GRTPDALREQ ARRLGEPAAG 
501 LADASVSEVG WSLATTRALF DQRAWVGRD LAQAGASLEA LAAGEASADV 
551 VAGVAGDVGP GPVLVFPGQG SQWVGMGAQL LDBSPVFAAR lAECEQALSA 
601 HVDWSLSDVL RGDGSELSRV EWQPVLWAV MVSLAAVWAD TfelTPAAVIG 
651 HSQGEMAAAC VAGALSLEDA ARIVAVRSDA LRQLQGHGDM ASLSTGAEQA 
701 AELIGDRPGV WAAVNGPSS* TVISQPPEHV AAWADAEAQ GLRARVIDVR 
751 YASHGPQIDQ LHDLLTDRLA DIQPTTTDVA FYSTVTAERL DDTTALDTAY 
801 WVTNLRQPVR FADTIEALLA D6YRLFIEAS PHPVLNLGIQ ETIEQQAGAA 
851 GTAVTIPTLR RDHGDTTQLT RAAAHAFTAG APVDWRRWFP ADPTPRTVDL 
901 PTYAFQHKHY WVEPPAAVAA VGGGHDPVBA RVWQAIEDLD IDALAGSLEI 
951 EGQAESVGAL ESALPVLSAW RRRHREQSTV DSWRYQVTWK HLPDVPAPEL 
1001 SGAWLLLVPA AHADHPAVLA TAQTLTAHGQ EVRRHWDAR AMERTELAQE 
1051 LRVLMDGAAF AGWNLIiAliD EEPHPEHSAV PAGLAATTAL VQALADNGAD 
1101 lAVRTLTQGA VSTSAGDALT HPVQAQVWGl. GRVAALEYPR LWGGLVDLPA 
llSl RIDHQTLARL AAALVPQDBD QISIRPSGVH ARRLAHAPAN TVGSGLGWRP 
1201 DGTTLITGGT GGIGAVLARW LARAGAPHLL LTSRRGPDAP GAQELAAELT 
1251 ELGAAVTVTA CDVGDRBQVR RLIDDVPAEH PLTAVIHAAG VPNYIGLGDV 
1301 SGAELDEVLR PKALAftHHLH ELTRBLPLSA FVMFSSGA6V WGSGQQGAYG 
1351 AANHFLDALA EHRRAEGLPA TSIAWGPWAE AGMAADQAAL .TFFSRFGLHP . 
1401 LSPEIiCVKAL QQALDAGETT LTVANFDWAQ FTSTFTAQRP SPLLADLPEN 
1451 RRASAPAAQQ EDATEASSLQ QELTfiAKPAQ QRQLLLQHVR SQAAATLGHS 
1501 DViJAVPATKP PO^LGPDSLT AVELRNRLNK STGLTLPTTV VFDHPTPDAL 
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1551 
1601 
1651 
1701. 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 ' 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
• 2601 
2651 
2701 



IBVLRAEI^G DAA;^«>PVE lU-^SEGAAD DEPIAIVGMA CRYPSDWSi. 
^«DLV««5 KDA«<aFPDD RGTOI^IL^ PDPESRSTSY 
SDFDAOrFSI SPREAV»B>P QQRLLLmW EMEi^GI^R EIUCGSD^OV 
PTQLTIFDYL M,VGEQPIEV EaYIGT(»G CmSGEVSYV I/5LEGP*MTI 
BIGCSSSLVA IHQAAHALRQ OECSI^LAGG ATVmIt«SF VBFSWRGIA 
KDGRCKPFAA AAMTGWi^G . VGLWI^S E».imOH«VL ^VIEOSAINQ 
OTLTAP SGQMiORVIR Q1.L*HARLSA BDVTAVMO ^BPIE 
;^VATYGK. KRPADEPLV& GSIKSHIOHA QASAGVAGVI KMVmLHMEQ 
I,PASLHinRP TPHVDWDGSa VELLSEPVSW PRGEEPERRG VSAFGISGTK 
AHI.ILE9APD APEPVTAPAE DAAAPAGWP WWSARGEEA .EAQAELIAD 
RATADPRLAS PlflVGWSI-VK lESVFENEAV WOKDROTLI, AGI*SLAAGE 
PSPDWEGAV QGASGAGPVI. WPGQGSQW GMQAQLU.ES PVFAARIAEC 
BHALSAHVDW SLSAVLRGDG SEI.SRVEWQ PVLWAVMVSL ASVWAB^GIT 
PAAVIGHSOG EMAAACVAGA I^I^IV AVRSDALRQL «GQG«ASLG 
AGSEQVAELI GBRPGVCVAA V«GPSSTVIS GPPEHVAAW ADAEARGUiA 
RVIBVOYASH GPQI.DQLBDI. LIEEI^IEP TTTDVAFYST VTAERI^DTT 
^TDVWVTN I..QPVRPADT lEALI^ LFIEASPHPV Lm^lB 
P^^PATWP TI^HGDAA Q.TEAAACAF GAGAEVDWTG «FPAVPLPRV 
VD^PIYAPQR ERFWI^RRG LAGDPAfll^L ASAGHPLI^ AVE^DGGSH 
U,«»ISPRD QAWLAEHRVM. BTVI^POSAF VEIA«2AAW AGCAELAELT 
IHTPLAFGDE GAGAVDVQW VGSVAEDGRR PVTVHSRPTG EGEEAV«IRH 
,,;«5VVAPPGP DAGDASFGGT WPPPGATPVG EQDPYGEX^S .G^^X^PGSQ 
GI.VSA«RLGI> DLFAEVALPE AESGRADRTO VHPVU^TL HALIU=AVTS 



TAVDGGGGGB 
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2751 PVLTLESLTV RPVAAHQIAG ARAADRDALF RLVWMEVAAR AEETGGGAPR 

2801 AAVLAPVESG PMGGTSAGAL ADALSDALAA GPWDTFGAL RDGVAAGGEA 

.2851 PDWIiAVCAA PGAGAGAVAD ADGRGGDPAG YARLATVSLL SLLKEWVDDP 

2901 AFAATRLVW TRGAVAARPG ETAGDLAGAS LWGLVRSAQA ENPGRLTLLD 

2951 VDGLESSPAT LTGVLASGEP ELALRDGRAY VPRLVRDDAS VRLVPPVGSL 

3001 TWRLARCQBA GGGQQLSLVD APEAGRALEP HEVRVAVRAA "APGPLTAGQV 

3051 EGAGWTEVG GEVGSVAVGD RVM6LFDAVG PVAVTDAALL MPVPAGWSWA 

3101 QAAGSLGAYV SAYHVIADW APRGGETLLV GEETGSVGRA VLRLALAGRW 

3151 RVEAVDGAST ADDSGAERAA DVTLRHEGAL WHRAGGRPD EGQAWPPEP 

3201 GRVREILAEL TELTELAEIT ESABPGLPAE RGDSRALTPL DITVWDIRQA 

3251 PAAMAAPPSA GTTVFSLPPA FDPEGTVLVT GGTGALGSLT ARHLVERYGA 

3301 RHLLLSSRRG ADAP6ALELA ADLSALGARV TFAACDPGDR DBAAALLAAV 

« 

3351 PSDHPLTAVF HCAGTVNDAV VQNLTAEQVE EVMRVKADAA WHLHELTRDA 
3401 DLSAFVLYSS VAGLLGGPGQ GSYTAANAFL DALARHRHDG GAAATSLAWG 
3451 YWEIiASGMSG RLTDADRARH ARAGWGLGA DE6LALLDAA WAGGLPLYAP 
3501 VRLDLARMRR QAQSHPAPAL LRDLVRGGSK SGGGAVSAGA AALLKSLGAM 
3551 SDPEREEALL DLVCTHIAAV L6YDAATPVN ATQGLRELGF DSLTAVELRN 
3601 RLSAATGLKL PATFVFDHPN PAELAAQLRQ ELAPRAADPL ADVLAEFERI 
3651 EDSLLSVSSK DGSARAELAG RLRATLARLD APQDTAGEVA VATRTRIQDA 
3701 SADEIFAFID RDLGRDGASG QGNGQPTGQG NGHGNGNOIG NOIGHGQAVE 
3751 GQR* 

MonAVn, polyketide synthase multi-enzyme MONS7, housing extension 
module 10 Length: 1642 amino acids . 

1 MAHTEEKLIil YLKRVTADLR QTERRLQDVE SAGHEPVAVI GMACRLPGGV • • 
51 RSPEEFWELV STGGDAVML POSRUWDLDS LYDPDPESTG TSYVREGGFV 
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101 YDAGDFDPTF FGIGPTEAAA MAPQQRLALE TAWEAIERAG IDPLSLRSSD 
151 TSTFIGCDGL DYALGASEVP EGTAGYFTIG NSGSVTSGRV AYTLGLEGPA 
201 VTVDTACSSS LVSLHLATQA LRTQBCSLAL AGGTYVMSSP APLIGFSELR 
251 GLAPDGRCKP FSASSDGMGM AEGTGWLLE RLSDARRKGH KVLAVIRGSA 
301 INQDGASNGL TAPNGPAQER VIRAALANAR LAPEDIDAVE AHGTGTTLGD 
351 PIEAGALISA YGRERPEDRP LWVGAVKSNI GHTQIAAGVA GVIKMVLALR 
401 HDLLPAILHV DAPSPHVEWD GSGLRLLTDP VKWPRGBRPR RAGVSSPGFS 
451 GTNAHLILEE APPEEEDVP^ SVAEEPGGW PWWSGRTPD ALRAQARRLG 
501 EFAAGPADAS AADVGWSLTT TRSVFEHRAV WGRDRDALT AGLGALAAGB 
551 ASAGWAGYA GDVGPGPVLV FPGQGSQWVG MGAQLLDESP VFAARIAECE 
601 RALSAYVDWS LSAVLRGDGS ELSRVEWQP VLWAVMVSLA AVWADYGVTP 
€51 AAVIGHSQGE MAAACVAGAL SLEDAARIVA VRSDALRRLQ GHQDMASLST 
701 GAEQAABLIG DRPGVWAAV NGPSSTVISG PPEHVAAWA DAEARGLRAR 
751 VIDVGYASHG PQIDQLHDLL TERLADIRPA NTDVAFYSTV TAERLTDTTA 
801 • LDTDYWVTNL RQPVRFADTI EALLADGYRL FIEASAHPVL GLGMEETIEQ 
851 ADIPATWPT LRRDHQDTTQ LTRAAAHAFT AGAPVDWRRW FPADPTPRTV 
901- DLPTYAFQHQ HYWLERSASA SGAVSGEQSA AEAQLWHAVE ELDLGLLAET. 
951 LQSBEQSEEA VRALEPALPV LKGWRRRHQD QATIDSWRYR VTWKQRSDGP 
1001 APELGGDWLL FVPADKAEHP AVRATABALS EHGAAAVRLH . PVETGRAGRQ 
1051 ELAAVDTAGL AGIVNLLALD EEPHPEHPAV PAGLAATTAL LQALGDNGTT 
1101 APLHTVTQGA VSTGATDPLT HPLQAHVWGL GRVAALEHPR LWAGLVDLPA 
1151 RIDRHTLPRL AAALLPQDDB DQTAVRPTGI HHRRLTHAVG SIQNPVHSEA 
1201 TWRPRGTTLI TQGTGGIGAV LARWLARQQA PRLHLTSRRG PDAPGARELA 
1251 AELDGLGTAV TITACDVSDP RQLSGLIDDM PAEHPLTAVI HAAqMTDLTJV. 
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1301 IGDLTTARLG EVLGSKSDAA WNLHBLTRDL DLSAFVMFSS GAGVWGSGQQ 

1351 GAYGAANHFL DALAEHRRAQ GLPATSIAWG PWAEAGMSAD PESLTYPKRP 

1401 GLLPIAPDLC VKALHQAVDA GDATLTVANF DWAKFTPTFT AQRPSPFLDD 

1451 LPENQREAEQ TGTAAETSAF REEIAKTPAS QRLOFLVQQV RTYA^VATLGR 

1501 TVEDIPAAKP FQELGFDSLT AVQLRNQLNT TTGLSLPATV IFDHPTPEAL 

1551 ATHLRGQLGD GAEVAGEGDV LAALDKWDTA FGAAEVDEAA TRRRIVGRLQV 

1601 LVSKWSPAQD GPEGTDSAHA DLEAASADDI FDLISSEFGK S* 

MonD, cytochrome P450 hydroxylase Length: 431 amino acids 

1 VGLTVQPDNA KRGIVPITDS KPAATFPDLV DPSFWARPHA ERVALFEEMR 

f 

51 GLPRPAFIRQ NMPGVPWTFG YHALVKYADI VEVSRRPQpP SSNGATTIIG 

101 LPPELDBYYG SMINMDNPEH SRLRRIVSRS FGRNMIPEFE AVATRTARRI 

151 IDELIARGPG DFIRPVAAEM PIAVLSDMMG IPABDHDFLF DRSNTIVGPL 

201 DPDYVPDRAD SERAVIEASR ELGDYIAGLR AERLAAPGND LITKLVQVQA 

251 DGEQLTRQEL VSFPILLVIA GMETTRNAIS HALVLLTEHP EQKQLLLSDF 

301 DTHAPNAVEE ILRVSTPINW MRRVATRDCD MNGHRFRRGD RIFLFYWSOT 

351 RDESVFPDPY RFDITRGTNA HVTFGAVGPH VCLGAHLARM EITVLYRELL 

401 AALPQIHAVG QPRRLDSSFI EGIKHLHCAF * 

MonRI, probable activator protein Length: 268 amino acids 

1 VRYEMLGPLR IKDGNDYATI NAQKVEIVLT VLLIRADRW SLEQLMREIW 

. 51 GEDLPRRATA GLHVYISQLR KFLKVPGSA6 NPVETRAPGY VLHKRDDDQI 

101 DAQIFPELVD VGRSLLREKR FDEAASCFGQ ALALWRGPIL GQGGNGPGTN 

151 . GPIIDGFSTW LTEIRIiECQE MLVECQLQLG RHREAVGMLY ALTAENPMCE 

201 AFYRQLMLAL YRSERQADAL KVY^JSVRKTL NDBLGLEPGR PLQELQRAIL • 

251 AGDMHLMSPP PLALSGR* • • ' 
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MonAX, thioesterase Length: 278 amino acids 

1 LSAFLAKGKI LSAFPPPDMS DPWIRRPRPR PE^WRLVCF PHAGGSASYY 
51 HPLAQSPTLP TDSEVLAVQY PGRQDRRRER LLDDIGELAD LITDALGPFD 
101 DRPLAFPGHS MGAVLAYEVA QRLRBRTGKQ PCRLFVSGRR APSRFRRGIV 
151 HLLDDTELAA ELRRAGGTDP RFLDDEELIA EIIPWRNDY ^VELYRWNP 
201 SPPLSCPITA LVGDRDPQAP LDBVEAWQQH TEGPFDLKVF AGGHFYLNTH 
251 QQGVTEVISK ALADSAQQRA TARGNAR* 

ORF29, a homologue of CapK involved in ceU waU biosynthesis Length: 42 
amino acids 

1 LADLVAHARS ASPYYRELYH GLPERIEDPT LLPVTDKKQL MDHFDDWPTD 
51 RDITFEKVRA FTDDPELIGR RFLGRYLVAT TSGTSGRRGL FVLDDRYMNV 
101 SSAVSSRVLA SWLGPLGIAR AWHGGRFAQ LVATBGHYVG FAGYSRLRQD 
151 GEARSKLVRA FSVHEPMSRL VAELNEYRPA FVIGYASTIM LFTAEQEAGR 
201 LHIDPVLVEP AGBTMTESDT DRIAAAFGAK VRTMYSATEC TYLSHGCAEG 
251 WYHVNDDWAV LEPVDADHRP TPPGEFSHTT LISNLANRVQ PFLRYDLGDS 
301 VMLRPDPCPC GTPSPAIRVQ GRSGDILTFP SGRGDDVSLA PLAFSSLFDR 
351 MPGVELFQIE QTAPSTLRVR WQAPGADAt) ' HVWQRAHDGL THLLADNKLD 
401 NVTVERGEEP PRQASGGKYR TIIPLAA* 

LipB, lipase B Length: 338 amhio acids 

1 VKVPVEVTVR LSSWLGGLVA AVLAATVLPA SAASAADVSS PPLEIPAAEL 

51 AKALHCGTEL GDLRDAGDKP TVLFVPGTGL KGEE^AWNY MAE^^KKKGYQ 

101 SCWVDSPGRG LRDMQESVEY WYATRAIQE ATGRKVDLVG HSQGGLLTAW . 

151 ALRFWPDLPG KVDDMVTLGS. PFQGTRLASP CRPIAEVAGC PASVLQFARD 

201 SNWSKALGAD GTPMPAGPSY TTIYSYADES VVADGEAPSL PGAHRIGVQD 
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251 ICPGRPWPTH lAMWDQVSY DLVADAIEHP GPADTSRIDR AHCAKPVMPL 
301 NSQEAVDALP GLLNFPIELL IHSQPWVDEE PPLRPYAR 

ORF31, putative ion pump Length: 309 amino acids 



1 


MGHDHGPSAG 


AAGGTLSGTY RKRLLWTIGI 


SGSITVIQW 


GALLSGSLAL 


51 


IiADAAHSLTD 


AVGVSLALGA ITLAQRAPTP 


RRTFGFCRVE 


IFSAVLNALL 


101 


LWIPAWVLW SAIGRFSEPy EVKGGLMFW 


ALGGLAANLV 


GLWLLRDAKE 


151 


KSLNLRGAYL 


EVLGDALGSV AVIVGGLVIL 


LTGWQAADPI 


ASIVIGLLIV 


201 


PRAYGLLRDS 


LHVLLEATPQ DVDLGEVRRH 


LLEERGWAV HDLHGWTVTS 


251 


GMPVLTAHW 


VTEEALASGY GBLLGRLQRC 

■ 


VGGHFDVAHS 


TIQLEPEGHV 


301 


EEDGALHT* 









ORF32, hypothetical membrane protein Length: ammo acius 

1 MTRALTLHDW IVAGIAWAG WAOLLLRAL LRWLGERASK TRWSGDDVIV 
51 DALRTLVPCA AITAGLAAAA GALPLTPRTG RNVTMTLTAL LILAATLTAA 
101 RIVTGLVKAV AQSRSGVAGS ATIPVNITRV WLAMGFLIV LQTLGISIAP 
151 LLTALGVGGL AVALALQDTL ANLFAGVHIL AAKTVQP6DY IQLSSGEEGY 
201 WDINWRNTT VRQLSNNLVI IPNAKLAGTN MTNYSRPEQE LSIMVQVGVS 
251 YDSDLEQVEK VTTEWDEVM AEIT6AVPDH EAAIRFHTFG DSRISFTVIL 
301 GV6EFSDQYR IKHEFIKRLH QRYRAEGIRV PAPVRIVRVQ QGELPPPLGI 
351 PHQRDTSTQA RLH* 

AmtA, glycine amidinotransferase (partial coding sequence) 
Length: 131 amino acids 

1 MSPVNSHNEW DPLEEIIVGR LEGATIPSSH PWACNIPTW AARLQGLAAG 
51 FEYPQRLIEP AQQELDQFIA LLQSLDVTVR RPAAVDHKHR EGTPDWQSRG 
101 FCNSCPRDSM LWGDEIIET PMAWPCRCFE T 
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CLAIMS: 



10 



20 



1. A DNA sequence which is (a) at least part of 
the sequence set out in the appended sequence listing; or 
(b) a variant of a sequence (a) which encodes a 
polypeptide which is at least 80%, preferably at least 
90%, identical with tixe corresponding peptide as set out 
in table II; provided that it is not a sequence encoding 
all or part of the polypeptide consisting of aniino acids 
1-920 encoded by mon AI as set out in table .II . 

2. A DNA. sequence according to.. claim 1 comprising 
the complete monensin gene cluster or a variant thereof. 



15 3.- A DNA sequence encoding at. least. part of at least 

one polypeptide which is necessary for the biosynthesis 
of monensin, and- which is encoded by DNA included in the 
appended sequence listing or an allele, mutation or other 
variant thereof; provided that said polypeptide is not 
all or part of amino acids 1-920 encoded by mon AI, as set 



out in table II. 



4. A DNA sequence according to claim 3 which 
comprises at least part of one or more of the following 
25 genes: mon BI, mon Bllr mon CI, mon CII., mon H, mon RI, 
mon RII, mon 1, mon AIX and mon AX. 
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5. A DNA sequence according to claim 4 coitqprising 
all of the genes listed therein or an allele, mutation or 
other variant thereof. 

5 6. A DNA sequence according to claim 3 encoding at 

least part of one or more of the polypeptides set out 
below, said polypepti£le having the amino acid sequence as 
set out in the appended sequence data or being a variant 
thereof having the specified activity: 

10 pfiPtide activity 

mon CII epoxyhydrolase/cyclase 

mon E s-adenosylmethionine-dependent methyltransferase 

mon T monensin resistance gene 
, mon RII repressor protein 
15 mon AIX ■ thioesterase 

mon AI polyketide synthase multienzyme 

mon All polyketide synthase multienzyme 

mon AIII polyketide synthase multienzyme 

mon AIV polyketide synthase multienzyme 
20 mon AV polyketide synthase multienzyme 

mon AVI polyketide synthase multienzyme 

mon AVII polyketide synthase multienzyme 

mon AVIII polyketide synthase multienzyme 

I 

mon H regulatory protein 
25 mon CI flavin-dependent epoxidase 

mon BII carbon-carbon double bond isomerase 
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mon BI carbon-carbon double bond isomerase 

mon D cytochrome P450 hydroxylase 

mon RI activator protein 

mon AX thioesterase 



10 



7. A DNA sequence according to claim 6 encoding a 
single enzyme activity of a multienzyme encoded by any of 
mon Al-mon AVI 1 1 or a variant or part thereof. 

8. A DNA sequence according to any preceding craim 
encoding any one or more of the domains as set out in 
Table I or a variant or part thereof. 



9. A DNA sequence according to any preceding claim 
15 which has a length of at least 30, preferably at least 60, 
bases. 



10. A 'recombinant cloning or expression vector 
comprising a DNA sequence according to any preceding 



20 claim. 



11. A transformant host cell which has been 
transformed to contain a DNA sequence according to any of 
claims 1-9 and which is capable of - expressing a 
25 corresponding polypeptide. 
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12. A hybridisation probe which is a DNA sequence 
according to any of claims 1-9. 



20 



13. Use. of a probe according to claim 12 to detect a 
PKS cluster, optionally followed by isolation of the 
detected cluster. 



14. Use of a probe according to claim 12 which 
encodes at least part of a polypeptide having a known 
10 function to detect genes encoding polypeptides having 



analogous function 



15. Use according to claim 14 wherein the 
polypeptide of known function is AT of module 5 or the 
15 regulatory protein encoded by mon RI. 



16. A hybridization probe comprising a 
polynucleotide which binds specifically to a region of the 
monensin gene cluster selected from mon BIr Jnon BII, mon 
CIr mon CII, mon H, mon RJr mon Rllr non mon AIX and 



mon AX. 



17. Use of a probe according to claim 16 in a method 
of detecting the presence of a gene cluster which governs 
• 25- the synthesis of a polyether, and optionally isolating a 

gene cluster detected thereby. 
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18. Use of a probe according to claim 12 which 
comprise a polynucleotide which binds specifically to a 
gene responsible for levels of activity of the monensin • 
gene cluster, in a method of detecting an analogous gene 
5 in a gene cluster for biosynthesis of another polyketide, 
optionally followed by a step of manipulating the gene 
detected thereby to ajLter the level of expression of said 
other polyketide. 



10 19. Use according to claim 18 wherein the gene is a 

regulatory gene, resistance gene or thioesterase gene. 

20. Use of the man RI gene or variant and a monensin 
promoter to control expression of a heterologous gene in 

15 S. cinnamonensis , 

21. Use of a portion of the monensin gene cluster 
encoding a polypeptide having chain terminating activity, 
preferably comprising at least one of mon AIX and man AX 

20 or a mutant, allele or other variant thereof encoding a 

polypeptide having chain terminating activity, to effect • 
chain release of a peptide other than monensin. 

■ 

22. Use of a portion of the monensin gene cluster 
25 encoding a polypeptide having carbon-carbon double bond 

* 

isomerase activity, preferably comprising at least one of 
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mon BI and mon BII or a mutant , allele or other variant 
thereof having isomerase activity to provide a desired 
stereochemical outcome in the synthesis of a polyketide 
other than monensin. 

5 

23 • A polypeptide encoded by a portion of the 
monensin gene clustej^ preferably comprising at least one 
of mon BI and mon BII or a mutant, allele or other variant 
thereof, having carbon-carbon double bond isomerase 
activity, or at least one of inon AIX and mon AX or a 
mutant, allele or other variant thereof having chain 
terminating activity. • 

24. An epoxidase enzyme encoded by mon CI or a 
derivative or variant thereof having epoxidase activity, 

■ 

25. A cyclase enzyme encoded by mon CII or a 
derivative or variant thereof having cyclase activity. 

26. • Use of a portion of the monensin gene cluster 
encoding a peptide having epoxidase or cyclase activity, 
preferably comprising mon CI or mon CII or a mutant, 
allele or other variant thereof encoding a polypeptide 
having epoxidase or cyclase activity to provide a said 
activity in the biosynthesis of a polypeptide other than 
monensin. 
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27 . A process for producing a polyketide containing 
a desired starter unit comprising providing a PKS gene 
having a loading module and a plurality of extension 
modules/ wherein the loading module includes a KS^ domain 
5 derived from a KS domain of a monensin extension module. 



28. A process according to claim 27 wherein the KSq 
domain is derived from KS of module 5 of monensin. 

« 

10 29. A process according to claim 2T or claim 28 

wherein the starter unit also includes an ATq domain 
derived from an AT domain which is naturally associated 
with the KS domain. 

15 • • 30. A DNA sequence comprising DNA encoding at least 

one PKS loading module and a plurality of PKS extension 
modules, and which can be expressed to produce a 
polyketide; wherein at least one of said modules or at 
least one domain thereof is a monensin module or domain or 

20 a variant thereof and is contiguous to a further one of 

said modules or a domain to which it is not naturally . . 
contiguous; provided that the sequence is not an erv 
loading module, the first and second extension modules of 
the erv PKS and the ery chain-terminating thioesterase , in 

25 which the DNA encoding AT of the first extension module 

has been substituted by DNA encoding an ethyl malonyl-CoA 
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20 



AT from the monensin gene cluster 



31. A DNA sequence according to claim 30 wherein 
said further module or domain is also a monensin module or 
domain or variant thereof. 



32 • A DNA sequence according to claim 30 wherein 
said further module or domain is a module or domain of a 
PKS of a polyketide other than monensin or a variant 



10 thereof. 



33. A DNA sequence according to claim 30, 31 or 32 
wherein said loading module. is adapted to load a starter 
unit other than a starter unit normally received by the 



15 adjacent extension module. • 



34. A DNA sequence according to claim- 33 wherein 
said loading module is derived from a monensin extension 



module or variant thereof. 



35. A polyketide synthase encoded by the DNA 
sequence of any of claims 30-34. 



3€. A polyketide compound as produced by. a synthase 



25 according to claim 35. 
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37. A vector containing a DMA sequence of any of 
claims 30-34. 



10 



15 



38. A transformant cell transformed to contain a DNA 
sequence of any of claims 30-34. 



39. A method of producing S. cinnamonensis capable 
of enhanced levels of production of monensin comprising 
engineering it to overexpress the mon RI gene. 



40- A method according to claim 39 wherein said 
engineering comprises introducing at least one additional 
copy of the mon RI gene as shown in the appended sequence 
data or a variant thereof. 



41. S. cinnamonensis containing multiple copies of 
the mon RI gene as shown in the appended sequence data 
and/or variant (s) thereof. 



20 42. A method of producing monensin comprising 

culturing the organism of claim 41 and/or an organism 
produced by the method of claim 39 or claim 40. 



43. A process for expressing a gene heterologous to 
25 5. cinnamonensis comprising transforming S. cinnamonensis 
with DNA encoding a heterologous gene and expressing said 
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gene under control of the activator gene mon RI or 
actII/orf4. 



44. A process according to claim 43 wherein said 
5 heterologous gene is a PKS gene, 

45. 13-Propyl efythromycin A. 
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SEOUENCB LISTING 

1 GATCAGCGCG GTGGCGTCGT CGGCGTCCAG CTCGTTCTGC GTGGCGGACG 

51 GCAGCGCGAT GTCGGCAGGC ACCTCCCAGA CCCGGCGGCC CGGCACGAAQ 

101 CGGQCCGAGQ CGCCGCGGCQ CTGGGCGTAG GTGTCCACGC GGGCGCGTTC 

• 151 GACCTCCTTG ACCTGCTTGA GGAGGTCCAG GTCGATGCCC TTCTCGTCGA 

201 CX3ACGTAACC QQAGQAGTCC GAACACGTCA CGGCGTTGGC GCCCAGGCSCQ 

' • 251 • GCGAGCTTCT GGATGGTGTA GATGGCGACG TTCCCGGAGC CGGACACGAC 

301 CGCCGTCCGG CCTTCGAGGG TCTCGCCGCG CTCACGCAGC ATCGCCX3CCG 

351 CGAAGAGGAC GTTGCCGTAG CCGGTCGCCT CCGGACGGAT CAGGGAGCCG 

401 CCCCAGTTGC GGCCCTTGCC GGTGAGGACG CCCGCCTCCC AGCGGTTGGT 

451 QATGCGCCGG TACTGACCGA ACAGATAGCC GATCTCCCGG CCGCCGACGC 

501 CGATGTCGCC CGCGGGCACG TCCGTGTGTT CGCCGATGTG CCGGTACAGC 

551 TCCGTCATQA ACGACTGGCA GAAACGCATG ACTTCC3QCQT CGCTGCGGCC 

.601 GCGCGGGTCG AAGTCGCTGC GGCCCTTGCC GCCGCCGATG CCGAGGCCCG 

651 TCAGCGCGTT CTTGAAQATC TGCTCGAAGC CCAGGAACTT GATQACGCCG 

701 AGGTTCACCG ACGGGTGGAA GCGCAGGCCG CCCTTGTACG GGCCGAGGGC 

751 GCTGTTGAAC TCCACCCGGA AGCCGCGGTT 6ACCCGCACG CGACCGTGGT 

801 CGTCCTGCCA CGGCACCCGG AAGACGATCT GGCGCTCCGG TTCGCACAGG 

851 CGCTCGATCA GGCCGGCTTC GGCGTACTCG GGGCGAGCCG CGATGACCGG 

901 CGCCAGGGTC TCGAGGACCT CGCGGGCGGC CTGGTGGAAC TCCGGCTGGO * 

951 CCGGGTTGCG GTGTTCGATC TCGGTGAGCA GCTGGGAGAG TGCTGTCTTC 

1001 TOCGAGAGAG CTGTCTTCGT GTCGGGTCGC GTGGTCAAAG GAGCCCTTTC 

1051 TGGCACGGCC GGOGTAGGCG CTCGGCGCCG TTGCCGTGCG CAGGGAGACG 

1101 CTCGAGCCGC AAGTATGAOG CGCATGTAAA CACAGCGACC AGCCCCCGGG 

1151 TCCAQGGAGT GACCACCATO CGAGACCGGG CCACCGGTAG GGCCACCGGT 

1201 CCGGCCTGCG GACCCCGTGT CACTTCCGGC TCGCGGCCAG GGGTGCCGCC 
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X251 CGGCQGACCG AATCGGCGGA GGCGGCCAGC AGTGGCATGC GGACGGCCGG 

1301 GCTGGGAATG CGGTTCTGGG CCTGCAGCAC- TCCCTTGATC ACCGTGGGGT 

1351 TCGGTTCGGT GAAGAGGGCG GCGGAAAGGC GGGCGAGGTC GGCTCCGAGA 

1401 GCGCGGGCGG GTGCGGCGGA GCCGCGTCGC CACAGCGCGA TCATCTCGGC 

1451 GTAGTCGGCG GTACGCAGAT TGGCCGACGC CACGATTCCQ CCGTGGGCGC 

1501 CCGCAGCGAC CAGCGGCGAG AGGACGATGT CGTCACCGCC GAGCACGGCG 

1551 AAGCCGGGCA GGGGCGAGTC GAGCAACTCC ATGGTGGTCG GGTCGATCGA 

1601 GCCGGTCGCG TGCTTGATGC CGACGACCTC CGGCAGGCGG CCGAGTGCGG 

1651 TGATCGTGCC CGCGCCGAGC GTCTGCCCGG TGCGGTAAGG GATGTCGTAC 

1701 ACGACCAGGG GGAGGCCGCC GTGCTCGGCC AQCGCGGCGA AATGAGCCAG 

1751 GGTCCCCGCT TCCCCGGGGC GGATGTA6GG CGGCGCGGGG ACCAGCGCGG 

1801 CGGCGACGTC ACCCCGGGCC GCCAGCTCTC GCAGGGCOGT- GATGGCGGTG 

1851 GCGGTGTCGT TGGTGCCCAC CCCGACGATG AGCGGTGCCC CGTGTGCCCG 

1901 GCACGCGGCC GAGCAGACGC GGATCACCGT CTCTCTCTCC TCGGCGGTCA 

. 1951 GTGTGGCGGC CTCGGCGGTC GTACCGAGGG CGACGAGCCC GGAGGCGCCG 

2001 GCCGACAGCG CCTCGTCGGC GAGTCGGGCC AGCQCCTCGG GGGCCAGGCG' 

f 

•2051 CAGATCGTCG GTGAACGQAG TTACCAGGGG GACX3TACAGG CCJGTTGAAGA 

2101 GCGGTTCGGT GGTCGGTTCG AGGCTCGATG CGAGGGTCAT GCTCTTACCC 

2151 TGGCCCACGC CACTCQGTAG ATCCATTTCA GATTCCTGCC GTCACACCTA 

2201 AGCTGAACTT ATGCTCXIATG TCCGTCXfCCT CCATCTGCTC CGCGAACTCG 

2251 ACCGGCGGGG CACCATCGCC GCCGTGGCCG AAGCGCTGAC CTTCACCGCG 

2301 TCCGCCGTCT CCCAGCAGCT CGGCGTGCTG GAOAGGGAGa CGGGCGTGCC 

2351 GCTGTTGGAA CGCAGCGGCA GGCGCGTGGT CCTCACGCCC GCAGGACGCT 

2401 CCCTCGTCGC ACACGCCGAC GCGGTGCTGA ACCGTCTCGA ACAGGCGGTC ' 

t 

24S1 GCOGAGCTGG <X3QGCGCACG GGACGGCATC GGCGGGCCGC TGCGCATCGG . 
2501 GACGTTCCCT TCCGGGGGCC ACACCATCOT CCCCX3GCGCG CTGGCCGAAC 



-2- 



SUBSTITUTE SHEET (RULE 26) 



wo 01/68867 

PCT/GBOO/02072 

25S1 TQGCCTCTCG TCACCCCGCG TTCGAGCCGA TCOTOCGGGA GATCQACTCC 
2601 GCGCGCXSTCT CCGACX3GTCT GOSGaOCGQT GAQCTOGACa TOGCCCTOQT 
2651 ACACX5ACTAC QACTTCQTAC CCGCGACGCC GGACACQACC QTGQACGAGG 
2701 TGCCTCTGCT CGAA6AGCCG ATCTACCTCG. TCACCCATQC CGOQQACACT 
2751 GCCACGGACT CCXilGCTCCGG OAGCACACTC GCaGCGCTGC TCGGGCCCTG 
2801 TOCCGAGGTT CCGTGGATCA CGGCGCGGGA CGGCACGACC QGTCACGCQA 
2851 TGGCTOTACG CQCCTGTCAG GCCGCCGGGT TCCAGCCCAlS QATCCQCCAC 
2901 CAGQTCAACG ACTTCCGCAC GGTGCTSGCT CTGGTCGCCG CCGGGCAGQG 
2951 GGCCGGGTTC QTGCCGCGOA TQOCCacOOA QCCSAQCCCC QCGGQOGTOO 
3001 TGCTCACOAA GCTQCCGCTO TTCCGTCGCT CGAAGGTCGC GTTCCXSTOCO 
3051 GGCGGCGGTG CCCATCCQGC GATCGCCQCT TTCOTOGCOQ CSSOCGACQAG 
3101 GGCQQTCGAA CGCATGQCGO GTOCACQAGG CCCGGCCGGC GGCTCTGAGT 
3151 GAACCGGCCG ACCGTGGQAA TGTGTGTGCC CTOGGCC6CA CCSITTCGTQG 
3201 CCTG6TQACG TCCTGGCGAC GTCCTOJVCGT CCTQATGTCC QAACQAGAAG 
3251 GCGATTTTCC QCGATGGCCG ATGACGCGTA C.CTGTTCCTC CTCCCCGACC 
3301 GGCACCCCCG ACTGQGAGCQ GCCCTCQCCG CCGTCGGTGC CTTOQAATOC 
3351 ACGGAAACCC CTGCGGTGCA CGCCTGQTTO CAGGCTCATQ AGGCCTCCGT 
3401 GTCCTCGOaA CAGGTCAGQA TTqiGCCXXSC .CQATOCCGA6 ACACfCATCC 
3451 CGAAGGACGC 0GA6CGGCTQ COSQTGCCGT aXSAGCGaGGA GGAGGCGCTC 
3501 AAGGTCGAGC AGGAGTGCGC GCCCCAGACC GTCACGGACA TGGAGAGCQA 
3551 ACTGCTCGCX3 TTCCGGGASA CC3ACCCyM3QA CTOQCAGGCC CTCGTOCACC 

« 

3601 GGOCCCTQAC. CGCQOGCATC CCCGCGCAGC GCATCGCCCG GCTGACCGGA 
3.651 CTCGACCCGG AGQAGATCGG COGCCTGTAG GGGCTAGCG6 CCGCCCAGTG 
3701 CGGACACCafl GATGOCQACC OTGACGGrGT TOAAfiACGAA GGCGATGACC 
3751 GTGTTCGCCG -CCACGGTCGG TOGCATGTCG -CGTOflGGTOA C6TGGACATC 
3801 GGTGGTCC06 AAOGTCGTCA TCOCQOCCAO. GQCGAAATAG JMMTAGTCGG 
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3851 CCXL2W3GCX3GG ACTCCOCTCC CCQGGGAATT CCAGTGCCCG CTCGTTCTCC 
3901 ACGAGGTTGT CGGCCTGGAA GGTGACGGCG AAGGCCACGA CCACGCAGAT 
3951 CCAGGCGGCG ACGACCAGGG CGAGCGCCAC CAGGGTGCGG GGOAGCGCGG 
4001 AQAAGGTGGT GCTQAGGTGG CCGGGAAGCC ACAGCACCGC CACCACCAGC 
4051 GCGGCAGCCG CGATGAAGAG CGAACCCCC6 GGGCCGGGCG CGGTTCCGAG 
4101 GACGTAACOC TGCAGQAATG TGCCGCGGGC TTCGCGCCGC GCCCAGG^GC 
4151 GGACCTGCTC CGGAGCGACX5 CTCACGAAGA CGGTCATGGT GATGGCGAGG 
4201 TAGGGCAGCA GGTAGGCGAA GAAGACGAGC ACGCCGACAT CCGCTGCCGA 
4251 AATCCGCACC ACGGCGTCGA TGGGGAGGAC CACTGCCGCG CACGCCGCGA 
4301 CGGCCAGGCT CACCGCCGAC CGGCGCCGTT CQGAAAGCCA GCGATGCACG 
4351 GACGAGCCTC TCTGGTCGGG CGTCGGGCCT CGTGTGATCG TGACCGGCTC 
4401 CGCGCCCGCC GAAAGCGCGG TGCGATCTCC TGCCCTCQAA CGAGCOAAAC 
4451 GCTTGCGCCG GAAAGCCTCC CT6CTGAT0C COACGQCGGC GGCAGTGGCT 
4501. GCGGATGCX3G ATCGTGCGCT GTGCCCTGAC ' CCTGGATGGG GGGAGGAACG 
4551 CAGAGAGGCA GGTGCGCCCA TGAOQGTCAT GGACAAQCTC AAGCAGATGC 
. 4501 ■ TCAAGGGGCA CGAGGACAAQ GCCGGCCAGG GAATCGACAA GGCGGGCGAC 
4651 TTCGTCGACG GGAAGACGCA GGGCAAGTAC AGCGGTCAAG TOGACACGQC 
4701 CCAGGACAAG CTCCOGGACC AGTTCGGCTC QGATCAGCAG GAGCCTCCGC 
4751 MAGGTAGGC AGCGTCAGGG CGGAATCGGT CCGGGCGACC GCTGACCGCT 
4801 OATGCAGATG COGCAGACGT CGGCCdCGCA CTCCTCCGGG TAAATCGGAG 
4851 CGTAGGCGGG GCCGACGTGT GCGCGTGCGG CCTCGTCTCT GCCGCCCCTC 
4901 TCCGCCCCGT CTCTGGCCCC TTGOTGCCAG- TCTGACGGGA AAATGGCACC 
4951 ACTTGGTOCC ACGCATGTGC CATGATGGCG TCATCGAGAG CGCGCTGCCC 
5001 CX3ACTCGCGG GCAGGAAGGG CGCGTTCCX3C GGAGTCX3GCG GTCXSGAGGGG • 
5051 TTGCATCATG GGGACAGCAC AGAGCCAGGA GCAGGCCGCC GCGCGCGGTG • 
5101 CCTGCGCCGC CTTCGTCCGC TTCGTGCTCT GGGGTGGCGG AGTGGGCCTC 
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5151 GCCTCCAGCT TCGCCX3TGGT CGCCCTCGCC TCGTGGGTTC CCTGGGCGCT 

5201 GGCCAACGCC CTGGTCGCCG TGGTCTCCAC .CGTCGTCGCC ACCGAGCTCC - 

5251 ACGCCCGCTT CACCTTCGGT GCGGGCGGGC GCQCGACCTG GC6GCAGGAC 

5301 GCGCAGTCGG CCGGGTCCGC GGCGGCCGCG TACGCGGTQA CCTGCGTGGC 

5351 QATGTTCGTC CTGCAGCAGC TGGTGGCGGC GCCCGGCGCG GTGCTCGAGC 

5401 AGGTCGTGTA CCTOTCQGCX! TCCGCGCTCG CCGGTGTCQC GCGGTTCCSTG 

5451 GTGCTGCGCC TCGTCGTCTT CGCCCGGAAC CGCTCGCTGC CCGCCGCGGC 

5501 CGCCGTGCGC ACCGCGCGTC 'CCGTGCGTCG CGTGCCGGCG CCCGTGCCCG 

5551- CGACCGTGGC CCACGCCGCA TCGCGCCCGG CCGGCCCCGC GGCGCTCTGC 

5601 CCCGCCQCAT GACTCCGTGC CCGCATGTTT GTGCCCCCGG TGCTCCGTGC 

5651 6TCCGGGGGC GGGGTGGGCG TCGTGCCCGG GTGGTCCAGG GGTCACGCQG 

5701 TGGTGTGTGC CAGTTCCTGG CCGAGGTGGT GGGCGAGCTG TGCGGGCGTG 

5751 GGGTTCTCGA GGATGGCGAC CATCGCGATC TCCATCCCX3G TCAGCGTCAT 

5«01 CAGTGTCTTG GTGAGCTCAA GGGCGGTGAG GGAGTTGAGA CCGTTCTCGA 

5851 GGAAGTTGCT GTCQTCQCTG AGGGTGGTGT TCAGAAGGGT GCCGGCCTGG 

5901 GTGCGGATGG TQTCGGTGAG GAGCTTCTCG CGCTCCTCGG GGGTGGCCGC 

5951 GGCGAGCTGC TTCTCCAGCT CGGTGGCGT.C CTGGCCGGAG GTGTGGTCGG 

6001 TGCTGGTCAT GACTGCTCCT GTGT6AGTGA GGTGTTGGCX3 GGGGTCACAC 

6051 CGCGGCGTGC GCGGTGTGGT CGTGCAGCCA GTAACGCGTG GCCTGGAAGG 

6101 AGTACGTCGG GAGGTCGATG <5TqCGGGGGT GGGGGGTGCG . CCGGACGAGA 

6151 GGGGTCCAGT CGACGGTGCC GCCCGTGGTG TGAAGCCGCG CQAGGGCGGT 

6201 CAACAGGGCG CTTACX3GCGG AGGTTTGCGT GCCTTCCGGT GAGAGCGCGC 

6251 CCyVGGTGGAG AAGCGTGTGG GTCTCGGGGG TGGGGGGTGC GGTGGGGGCC 

6301 GGCGAGGTGA- GGTGGTGGTG CCAGTAGTCG GCGGAGGCGA TGGGGGTGTC 

6351 GGCGGGGGCA GTGCTGGTGA OCGTGAGCQT OGCGCGTTGG AAGGTCAGCT 

6401 -GCTTCAGCAC -GGGCTCGTAG GOGTCGGGCG GAGCCGGTTG TTCACCCTCX3 
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6451 GCGGCCTGGG CGGCAGCGGC GTGQGCGGCG GCCAGGCGGC ACGCGTCGTC 
6501 GAGQGTCAGG ATTCCCGCGQ CGTACGCGGC GGCGATGTGG CCGACACCGT 
6551 CGCCGGTGAG GGTGTGGGGG CGTACCCCCG TTTCCAGGAG CAGCCGCGCO 
6601 AGCGCGGTGT QGACCGCGAA GCGCGCCAGT TCGGAGTGGG GAGTGGGGAG 
6651 GGGAGTCGGC AGATGGGTGT CGAGGAGCGC GCGCGCTTCG TCGAAGQCGG 
6701 ACX3CGAA0AG CGGGAACQCC GAQTGGAACT CGGCACCTCC GAAAGCc3&G 
6751 CCGAATGTAG CGCCGAATGT CGCGCCGGGT TTGGCTCCGG QTGCCGCCCC 
6801 CGTCGTCACC CCGTCGGCCG GSCGGCCGTC GAAGTGCCAG GCGATCTTCT 

6851 tcgggccggc. cccgggcgtg gacctgacca ggtccgggtg gtcctctccq 
6901 gcggccaggg cgcgggcggc ggcgaggagt toggtgtggt cggtgccggt 
6951 qaggacggcg cggtgttcca gggggctgcg ggtggcggcg agcgagtagq 

7001 CX3ACCTCGGC GGGGGAGGGC GCGGGGTCGG TGGCCGCCAG GTGGGTGACG 
7051 AQGQCCTTCG CCTGTGCCOG CAGQGCCTCG GGTGTACGAG CGGACAGGCT 
7101 CCAGGCCACCGGGAGTTCCG GGGCAACGGG CGACGTCTGG TCGCGGGCGG 
7161 CATCCGGCAC CGGAGCCTCG TCCACCQGCX3 GCTCTTCGAG GATGAGGTGC 
7201 GCGTTCGTGC CGGACGTGGC GAAGGCGGAG ATGCCGACCC <3GCGGGGCTC . 
7251 .CTCGCGGCGG GGCCAGTCGA CCGCCTCGGT GAGCAGCCQT ACCGCQCCCT 
7301 TCTTCCAGGC. GGCGAGGGGC GTCGGGCGGT CGACGTGGAG GGTCGGCGGC * 
7351 AGGQTGCCGT GCCGGAACGC CTGGACCATC TTGATOAGCG CGGCCGCACC 
7401 OGCGGCXrCC TGCGTOTGCC CCQTGTTQGft. CTTGACC3GAG CCGAQGCACA 
7451 GGGGCCGGTC GGGGGAGCGG TC<3GCGCCQT AGGTGGCGAS GAGGGCCTOG ' 
7501 ACCTCX5ATQO CQTCGCCGAT GGQGGTGCCC GTCCCGTQCG CCTCGACSOC 
7551 OTOOATCTQa TCCGGGGTGA GCCCGGCGTC GGCGAGGGCG GCGCGGATCA 
7601 CATGCTGCTG GGftGGGQCCG TTGGGGQCGG <XJAGGCCX3TA TCOGGOGCCG 
7651 TCCTGGTTGA CCGCGGAGCC -GOGGATGACG . GCGAGCACCG GGTGGCCGTT . . 
7701 CTTCCTGQCG T0GCC6AGCC GCTCAAGCAG QACQAQGCCG ACGCCTTCAC ' 
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7751 CGAGGCCCAT GCCGTCGGCC GCGGCGGCGA ACGGTTTGCA ACGGCCGTCC 

7801 TGCGCGAGCG ACTTCTGGTG GGCGAAGGCG TGGAAGGTGT GCGGCGTCGA 

7851 CATGACGGTQ CCGCCGCCGG CGAGGGCGAG GCCGCACTCC CCGCGGCGCA 

7901 GCGCCTGGCA GGCCAGGTGG AGGGCGACCA GGGAGGACGA GCAGGCCGTG 

'7951 TCCACGCTOA TGGCGGGGCC CTCQAGGCCG AGGGCGTAGG CGATGCGGCC 

8001 GGAGACGAGG CTGCCGGACG TGCCGCCGCC CAGATAGGGC AGCAGCtSsT 

8051 CGGGCGCQGT CTCGAGCCGT GTCQCGTAGT CGTGCCCGGT GGCGCCGACG 

8101 TAGACGCCGG TGAGGGTGGA GCGCAGGGTG TGCGGGGCGA TGTGGCCGCG • 

8151 TTCGACGGTC TCCCACGCGA GGTGGAOCAT GAGGCGCTGG AGGGGTTCGG 

8201 TGOCCACGGG CTCGGTGTCG CTGATGTCGA AGAAGCCCGC GTOGAAGCCG 

8251 GCCGCGTCGT CCAGGAACCC GCCGAGCTCC GCGTACGGGC GTTCCTCGGG 

I 

8301 GAGTTCCCAO GCGCX30TCGT CGGGQAAGCC GGTGACGGCG TCGCGGCCCT 

8351 cggaca:ccag atcccacagg tcgtccgggg tgcgggtctt gccgggcagc 

8401 cggcafigcca tggaqacgat: ggcgatcggc tcgtgctgtg cggcctxcag 

8451 ttcgcgcagc tgctgctggg cctggtggag ctcggccgtc gtccacttga 

8501 ggtattcgac gagcttctct tcgttcx3cca cgggaatggt cagccttcct 

8551 gttctcqcgc qtoaagcctc aggtgggacg aggtcgggca aggtgggcag 

8601 GCAGGAGCCG CGCGCTGTGG GTGCCAQGGT CGCCGCG6CT GCTTAAGCGG 

8651 GTCTAACTCC CGCCTTGCOS OCQGGCATCG CCTOGCACGA GCGGGCCAGC 

8701 AGCAGGAGGT CGGCGGCGAT CTCGTCGGGT GCGCCGGCGT GCAGATCOTG 

8751 GTCGGAGCCC GGGTACCAGC OCACGCTCAC CTGCTCCAGG GCCGCCTCGG 

8-801 CGGCGGCCAC . CCAGGCCCGT ACCTGGTCGG ACAGTTGGGG 6ATGGCGGGG 

■8851 ATGAGGGGCA GCAGCCGCAC CGGCACGGTG ACCTTOGQAT ACCAGTCGGC 

8901 CGGTGCCTCC OaTTGCAGGC CGGCGACGAT CGACATGACC TGTGTCGAGG 

8951 TCAGGCGGGG 'GATGAGCAGG OCGTCCGGCC • CQACGCGGTA GTCCGCGAGG 

9001 GGTGCCTCGA TOGAGGTGGG CQACCAGTCG <3GATGGGTGG CCICGCAGGTA 
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3051 GGCGCGCATG TCGGCGGCGC TGGTGGTGCC CTQCTGGGCG CGCCGGACCA 

■ 

9101 CGTCGQCGGT GCQCTCCCAG AAGGCGCGCA TCACCGGTCC GTCGAACTCG 
9151 TACCAGCCGC CGTCGATCAtS GGCGAGACCG GCCACCAGGT CCGQGTGCTC 
9201 GGCCGCCAGG CGCAGCGCGA GQTGCGCGCC CCAGGAGTGC CCGGCCACCA 
9251 GTGCGCCGGA CAGGTCGAGG GCGGTGACGG CCGCCACCAG GTCGGTGACG 
9301 ACC6TCGCGT TGTCGTACCC GTCGGGCGGG .GTGTCCQAGT CQCCGTGScC 
9351 GCGGTQQTOG AC?GGCGTAGG CCGGGTGTCC GGCJGGCGGCG AGACGGGCGG 
9401 CGACCTCGTC CCAGATCCGG GCGTTCGACA GCATGCCGTG CAGCAGCAGG 
9451 AACGGACGOC CCQGGOCTCC CGGCCCGTCC <3CGGGCCGGT ACCTGACATT 
9501 GAQGGAGACG GTCTGCGACA CGGGGATGCG GAGGTTCTTC ACAGGCGGGC 
9551 CCTTGTGATC CCTTGTGCTO GGGGAGGAAA GCGGGGGCGG CACGCTCAGG 
9601 GGCGCTGCGC GGTCGCGAAG ATGTATCCGA GCTCGGGCAT CTTGCCGAGG 
9651 GCCGCCTGGT TGTGCAGGAA . CAGCTCGTAT CCCTCTACQC CGATGATGTC 
9701 . GACGTACTCX3 TCCCGGTGGG CGCGGATCCA CTCGACGTAA CCGTCGTAGG 
9751 TCTTGGCGGT CTCGCGGGTG ATGTCGGTCA GTTCGAGGAC GGTCCAGCCG 
9801 GCGGCGCGGA AGATQTCGGQ GTAGTCCCCG ATGTCGGTGA GCGCGGCGTA 
9851 GATCGTGGTG TCGCTGACGG TGGCGGTCCG GGGCCQGCTG GGATCGGGGT ' 
9901 TQAGGTAGAC CATGTCGGCQ ATCQGCATCC GCQCGqCGGG CTTCACGACG 
9951 CGGTGQGCCT CGGTGAGCAC CTGCTGCTTG TCCGGCATGT GCAGCATGGA 
10001 CTCCAGGGCC CAGCAGTGGT CGAAGGAGCC QTOGTCQAAC QGCAGGTTCA. 
10051 TQGCGTCGAC CTGCTCGAAG CGGACCC<3GT CGGCGAGGCC GGCCTCGCGC 
10101 GCCCGGCOGT TGCCGCGCTC GACCTGGCGG GCGGTGACGG AGATGCCGAC 
10151 CACCTCGACG TCGCGGGCGC OGGCCAGCTG CATGGCOGQG GTGCCGTTCC 
10201 CGCAGCCGAT GTCGAGGACG COGTOGCCdG GGGCCGGGTC GAGGCGGCGG • 

« 

10251 ATCATCTGGT -GGGTCATCTG GACCATCGCC TCGTCGAACG TGGCCTGCTG 
10301 CTGGCCGCCG TCGAACCAGT AGCCGTAGTG CAGATTGCCG" TCTCC<3AGCT . 
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•10351 GAGTCATCAG GTCGAAGACC TTGTQGTCGT AGTAGTGGCC GATGTCGCTG 
10401 GGCTCGGGGG CGACQGTCTT GTTCACCGTC GGGGGCTTCT TGGTCGTCGC 
10451 GTTCTTCGTC ACGGCTTCAG CGTCACCGTG CGGCGGCAGG CGCCACAACC 
10501 CCACCCCCGC CCCTCAAAAG' CCCCTATQGG CCCTCCTCGA CCGCCCCTAG 
10551 GGAGCTGCTC TTGACGCGTT CCATACGGAA CGGGTGGTAC CCCTCCGAAA 
10601 AAAATGAGAG TACGCTCCCA CTAGATATTG AGCTCTCTTT AGGAGGT&A 
: 10653. CTCCCATGTC -TGCTGATCTG - GGTGCGCGGC GGTGGTGGGC CGTCGGTGCT 
10701 CTCGTACTCG CCTCGATGGT- CGTGGGCTTC GATGTGACGA TCCTGAGCCT 
10751 GGCGTTGCCC GCCATGQCCG ACGACCTCGG CGCGAACAAC GTCGAGCTGC 
10801 AGTGGTTCGT QACGTCGTAC ACGCTGGTGT • TCGCGGCCGG CATGATCCCG 
10851 GCCG6CATGC • TCGGTGACCG QTTCGGACGC AAGAAOGTCC TGCTCACCGC 
10901 CCTGGTOATC TTCGGTATCG CCTCGCT6GC. CT6TGCCTAC GCGACGTCCT 
. 10951 CCGGCACCTT CATCGGCGCG CGTGCGGTQC TCGGTCTGGG C6CCGCGCTG 
11001 ATCATGCCGA CGACGCTGTC GCTGCTGCCG GTCATGTTCT CCGACGAGGA 
11051 GCGGCCGAAG GCCATCQGAG CGGTGGCCGG TGCQGCX3ATG CTCGCCTATC 
11101 CGCTCGGCCC GATCCTCGGC" GGCTACCTGC TGRACO^CTT. CTGGTGGGGC 
11151 TCCGTCTTCC TGATCAACGT GCCQGTGGTG ATCCTCGCCT TCCTCGCGGT 
11201 CTCCGCCTGG CTGCCCGAGT CCAAGGCCAA GGAGGCCAAQ CCQTTCGACA 
11251 TCQGCGGCCT GGTOTTCTCC AGCGTCGGTC TCGCCGCGCT GACCTACGGC 
11301 GTGATCCAGG GCGGCGAGAA GG6CTGGACG GACGTCACCA CGCTGGTGCC • 
11351 GTGCATCGQC GGTCTGCTCG CCCTCGTGCT GTTCGTGATG TGGGAGAAGC 
11401 GGGTGGCGGA CCCGCTGGTC GACCTCTCGC TGTTCCGCTC GGCCCGGTTC 
11451 ACCTCCGGCA CCATGCTCGG C!ACCX3TCATC AACTTCACGA TGTTCGGOGT 
11501 GCTCTTCAC3G ATGCCGCAGT ACTACCAGGC GGTCCTOGGC ACCGACX5CGA 

> 

11551 TGGGCAGCGG CTTCGGGCOTG -CTGCCGATGG TOGGGQGTCT GCTCGTGGOT- 
11601' GTGACQGTCG CCAACAAGGT CQC3CAAGGCC CTCGGCCCGA AGACCGCGGT 

-9- 



SUBSTITUTE SHEET (RULE 26) 



wo 01/68867 PCT/GBOO/02072 

11651- CGGCATCGGC TTCGCCCTCC TCGCCGCCGC CCTGTTCTAC GGCGCCACCA 

11701 CGGACGTCAG CAGCGGCACC GGCCTGGCGG CCGCCTGGAC CGCGGCCTAC 

11751 GGACTCGGCC TCGGCATCGC CCTGCCGACC GCCATGGACG CCGCCCTCGG 

11801 CGCGCTCTCC GAGGACTCCG CCGGC6TCGG ATCCGGCGTC AACCAGTCCA 

11851 TCCGTACCCT CGGCGGCAGC TTCGGCGCGG CCATCCTCGG TTCCATCCTC 

11901 AACTCCOGCT ACCGOGGCAA GCTCGACCTC GACGGCGTGC CCGAGCAd&C 

11951 ACACGGCGCG GTCAAGGACT CCGTCTTCGG CX3GCCTCGCG GTGGCCCGGG 

12001 CGATCAAGTC CAACQ6ACTG GCCGACTCQG TGCGTTCCGC GTACGTCCAC 

12051 GCCCTGGACG TGGTGCTCGT GGTCTCCGGC GGCCTCGGAC TGCTGGG7OT 

12101 GGTQCTGGCG GTGGTGTGGC TGCCCCGCCA TGTTGGTCAG AGCACCGCCA 

12151 AGACAGCAGA ATCTGAGCAT GAAGCXIGCAG ACGCAGTCTG ACCAGGGCAA 

12201 AACAGTGCCT GGTCTGAGAG AACGCAAGAA GGCCCGGACG AAGGCCGCGA 

12251 TTCAGOGGGA GGCGGTGCGC TTGTTCAGGG AACAGGGCTA CACCGCCACG 

12301. ACCATCGAGC AGATCGCCGA AGCCGCCGAG <3TCGCTCCCA GCACCGTCTT 

12351 CCGCTACTTC QCOACCAAGC AGGACCTGGT CTTCTCGCAC GACTACGATC 

12401 TGCCCTTCGC GATGATGGTC CAGGCCCAGT CACCCGACCT GACGCXGATC 

12451 CAGGCCGAGC GGCAGGCCAT CCGiCTCXSATG TTGCAGGACA TGAGCGAGCA 

12501 GGAACTGGCC CTGCAQCGCG AGCGGTTCOT CCTGATTCTC TCCGAGCCGG 

12551 AGCTCXGGGO CGCCAGCCTC GGCAACATCG GCCAGACCAT GCAGATCATG 

12601 AGTGAGCAGG TGGCXIRAACG GGCCGGGCGC GACCCGCGGG ACCGCGCGGT 

12651 CCGCGCCTAC .ACCGGAGCCG TGTTCGGAGT GATGCTCCAG GTCTCGATGG 

12701 ACTGGGCCAA CGATCCGGAC ATGGACTTCG CGACCACGCT GGACGAGGCA 

12751 CTCCACTACC TGGAAGACCT GCGGCCCTGA CCGAAGGGGC GGGCGCACAC 

12801 CACAGAGCCC GCCGCGGCCA GACGTGGTAC <3AGGCGCCAT CGGCCGTCGC 

* 

12851 OTACGACCCC CJOCGCCCCGQ ATTCCCCGGC GGGGCGGGGG GTCAAGGGAA 

12901 ■ AAGAQACGAC C3GCACGCX3GC. CACTGTTCCC 0CX3GCTGCCG CGTCCGGTCC 
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• 12951 AACCTGGCGT GCTCCGGCTT CCCTCGACGG AGCACGCCAG GGGTCTGTCC 

13001 GGCCCTCTCC CGGCGGCTCC CGTCAGACGC CCGGCCCCGC CGTCAGCGCC 

■ 13051 TCGGTCACGA CGGCCGCCAG CTGCTCCTGA CAGCCGTCGA GGTAGAAGTG 

13101 CCCGCCGGGC AGCACCCGCA GATCGAACGC CGCGCCGGTC CGCTCGCGCC • 

13151 ACGTGGCGGC CTGCTCCGGC GACGTCCGCT CGTCGGCGTC CCCGATCAGC 

13201 GCCGTGATCG GGCAGTCGAG CCGGCCGGGT CCCGGCGCCT CGTAGGXdbc 

13251 CACGGCCCGG TAGTCGGCCC GCAGCQCQQG CAQGACOAGC TCCTGCAGCT 

13301 CGGGACTGCG GAAGAACCGC TCGTCGGTGG CGCCCATCGC CCGCAGATGG 

13351 GCCAGGATGT CCGCGTCCCC GAACGCCCCC GAACGCCCCG CGGGACGGTA 

13401 GGGCCGCGCG AGCCCCCCGG AAACGAACAG GTGCACGGGA AGGCCGGGCC 

13451 CGGCCGGTCC CCGCAGCCGC CGCGCCACCT CGAACGCCAC GATGGCGCCC 

13501 AGGCTGTGCC CGAACAGCGC GAATGGCTTC CCGTOGCACG GCAGGTGGGG 

13551 CACGACGCCG TCGGCGAGCT CGGCCACCGA CGCCAGGCAC GGCTCCGCAT 

13601 GACGGTCCTG CCGCCCCQGA TACTGCACGG CGAGCACCTC GACGCCGGGC 

13651 GCGAGCAGCC CGGAGAGCCC GAAGTAGTAA CTCGCCGAAC CGCCCGCGAA 

,13701 CGGAAAGCAG ACCAGCCGCA CCGGCGCCTC TGCCGCAGCG TGGTACCGCC ' 

13751 GCAACCACAC CCCGTTTCCG GTGGCTGCAC CGAACTCGTC ACCGATCTGT 

13801 GGTGCCCGCG CCGCCGTGCC CCTGTCCATC GTTCTCCCTC TCCTCGCGTC 

13851 GCTCCGCGGG CGCTGTCCTG CCCCQCCCCG AAAGCCCQAT GCCGGGCAAG 

13901 CCCCGATGCT GGCCAAACCC CGATGCCGGC O^GCCCCGA TGCTGGCGGC . 

13951 GGCCCATAGC GCCCGGCTAA AGCCGCAGGC GGCTAGCCGG GGTTTGGTTC 

14001 GCCTTTAGAC AGCCCACCCA GGATGAGCCC GGTAGTCGAA GCGATCTCCG 

14051 ATTTCGGACG GGGAGCGCCQ TTQATQTTTT GTG6CAGCCA QTTGTTCAGC ' 

14101 GCCCGACCGC AGCTGACGTG ATGGCCGCAT. CCGCGTCAGC GTCCCCCTCG 

14151 <3GACCGAGOG CftGGACCOGA CCOGATCGCC GTGGTCGGQA TGGCCTGCGG. 

- 14201 -CCTGCOCJGGA GCftCCTGACC CCOAOGCGTT CTCGCGGCTG CTCAGCGAGG 
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14251 


GGCQCAGCGC 


GGTGAGCACC 


GCACCGCCCG AGCGGCGGCG AGCCGACTCC 


14301 


GGCCTCCACG 


GGCCGGGCGG 


CTACCTGGAC CGGATCQACG GCTTCGACGC 


14351 


GGACTTCTTC 


CACATCAGCC 


CGCGCGAGGC CGTGGCGATG GACCCCCAGC 


14401 


AGCGGCTGCT 


CCTCGAACTO 


AGCTGGGAGG CCCTCGAAGA GGCGGGGATC 


14451 


GGGCCGCCCA . 


CCCTGGCGCG 


CAGCCGCACC GGCGTCTTCG TCGGCGCGTT 


14501 


CTGGGACGAC 


TACACCGACG 


TCCTGAACCT GCGGGCGCCG GGCGCCGfCA • 


14551 


CCCGCCACAC 


CATGACCGGC GTGCACCGCA GCATTCTGGC CAACCGCATC 


14601 


TCGTACGCGT 


ACCACCTGGC 


CGGTCCGAGC CTCACCGTCG ACACCGCACA • 


14651 


GTCCTCCTCG 


CTCGTCGCCG 


TCCACCTGGC CTGCGAGAGC ATCCGCAGCG 


14701 


GCGACTCCGA 


CATCGCCTTC 


GCGGGCGGCG TCAACCTCAT CTGCTCGCCG 

• 


14751 


CGCACCACCX3 


AGCTGGCCGG 


GGCCCGCTTC GGCGQTGTCT CGGCCGCAGQ 


14801 


CCGCTGCCAC 


ACCTTCGACG 


CCCGCGCCGA CGGTTTCGTA CGCGGCGAGQ 


14851 


GCGGCGGCCT 


CGTGGTGCTC 


AAGCCCCTCG CGGCGGCACG GCGCGACGGC • 


14901 


GACACGGTGT 


ACTGCGTGAT 


CCGGGGGAGC GCCQTCAACA GCQACGGTAC 


14951 


GACCGACGGA ATCACCCTGG 


CCAGCQGGCA GGCGCAGCAG .GACGTGGTGC 


0.5001 


GCCTCGCCTG 


CCGACGGGCG 


CGGATCACGC CGGACCAGGT GCAGTACGTC 


15051 


GAACTGCACG 


GCACCGGCAC 


GCCCGTCGGG GACCCGATCG AGGCCGCCGC. 


15101 


* GCTCGGCGCC 


GCCCTCGGGC AGGACJGCCGC CCGCGCCGTG CCGCTGGCCG 


15151 


TCGGCTCCGC 


CAAGACGAAC 


GTCGGCCACC TCGAAGCCGC CGCCGOAATC 


15201 


GTCGGACTGC 


TCAAGACCGC 


CCTOAGCATC CAGCACCGGC GGCTGGCGCC 


15251 


GAGCCTGAAC 


TTCAGCACCC 


CCAATCCGGC. CATCCCGCTC GCCGACCTCG • 


15301 


GCCTGACCGT CCAGCAGGAC CTGGCCGACT - GGCCGCGCCC CGAACAGCCC 


15351 


CTGATCGCCG GGGTGTCGTC 


CTTCGGCATG QGCGGCACGA ACGOTCACGT 


15401 


TGTCGTGGCG' GCGGCGCGCG ATTGGGTGGC GGTACCTGAG COGGTGGGGG 


15451 


TGCCTGAGGG GGTGGAAGTG 


CCTGASCCQG TGGTGGTTTC TGAGCCGGTG 


15501 


GTGGTGOCGA CGCCATGGCC CGTGAGCGCT CACAGGQCTT CCGCGCTGCG 



-12- 



SUBSTITUTE SHEET (RULE 26) 



wo 01/68867 PCT/GBOO/02072 

15551 CGCGCAGGCC GGTCGCCTGC GGACGCACCT CGCCGCCCAC CGCCCCACCC 

15601 CCGACGCCGC GCGGGTCGGC CACGCGCTCG CCACCACCCG TGCGCCCCTC 

15651 GGCCACCGCG CGGTCCTGCT CGGCGGCQAC ACCGCCGAAC TGCTGGGCTG 

15701 CCTGGACGCG CTGQCCGAQQ GCGCGGAGAC CGCX3TCCATC GTGCGCGGCG 

15751 AGGCGTACAC CGAGGGCAGG ACGGCCTTCC TCTTCAGTGG GCAGGGAGCG 

15801- CAACGCCTCG GCATQGGGCX3 GGAQTTGTAT GCCGTGTTCC CCGTCTTCfec 

15851 CGACGCTCTC GACGAGGCGT TCGCCGCCCT GGACGTACAT CTGGACCGCC 

15901 CACTGCGCGA GATCaTCTTG GGCQAGACCG ACTCGGGTGG GAACGTCTCG 

15951 GGXGAGAATG TCATCGGCGA GGGTGCCGAC . CATCAGGCAC TCCTCGACCA 

16001 GACCGCCTAC ACCCAGCCCG CGCTCTTCGC GATCGAGACG AGCCTGTACC 

16051 GGCTGGCAGC CTCCTTCGGC CTGAAQCCGG ACTACQTCCT CGGCCACTCG 

16101 GTCGGCGAGA TCGCCGCCGC GGACGTCGCC GGTGTCCTCT CGTTGCCGGA 

16151 CGCGAGCQCT CTGGTGGCCA CGCGGGGACG GCTCATGCAG GCGGTTCX3CG 

16201 CGCCCGGCGC GATGGCCGCG TGGCAGGCCA CGGCGGACGA GGCGGCCGAA 

16251 CAGCTCGCCG GGCACGAGCG GCACQTCACC QTGGCCGCCG TCAACGGCCC 

16301 CGACTCCGTG GTCGTCTCCG GCGACCGCGC CACCGTCGAC GAACTGACCG 

16351 CCGCCTGGCG GGGACGCGGC CGCAAGGCCC ACCACCTGAA GGTCA6CCAC 

16401 GCCTTCCACT CCCCGCACAT GGACCCCATC CTCGACGAGC TGCGCGCGGT 

16451 CGCCGCCGGC CTGACCTTCC ACGAGCCGGT CATTCCCGTC GTCTCCAACG . 

16501. ' TCACCGGT0A ACTGQTGACC GCGACCJGCGA CXX3GGAGCX3G COCGGGGCAG ' 

16551 GCGGACCCCG AGTACTGGGC GCGGCATGCG CGCGAGCCCG TGCGGTTCCT . 

16601 GTCCGGGGTG CGGGGGCTGT GCGAGCGCGG GGTGACCACG TTCGTCGAGC 

16651 TCGGCCCGGA CGCACCGCTG TCCGCGATGG CCCGCGACTG CTTCCCCGCC 

16701 COCGCGGACC GGAGCCGXCC GCGCCGGGCC GGCATGGCCA CATGCCGC<:X3 

1'6751 CGGGCGC2GAC QAGGTGGCCA CGTTCCTGAG GTCGCTGGCC CAGGCGTAGG 

• 16801 TCCGOGGCGC CGATGTCGAC .TTCACCCGGG CCTACGGCGC CACCGCCACG . 
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16851 CGCCGCTTCC CCCTCCCCAC QTATCCCTTC CAGCGCQAGC GCCATTGGCC 

16901 TGCCGCTGCC GGGGTGGGGO AGCAGCCGGA GACCCCfGGAA CTTCCGGAAT 

16951 CCTCGGAGTC CTCGGAGCAG GCAGGGCATG AGCGGGAGGA G6GGGCGCGC 

17001 GCOTGGGGCG GGCCTGAAGG GCGGCTTGCC GGQCTCTCCG TGAACGACCA 

17051 GGAGCGGGTC CTCCTCGGCC TGGTCACCAA GCACGTGGCC GTCGTGCTCG 

17101 GGGACGCCTC GGGCAOGGTA CAAGCCGCCC GCACCTTCAA GCAGTTOd&C 

17151 TTCGACTCGA TGGCCGCCGC C6AGCTGAGC GAACGGCTCG GCACGGAGAC 

17201 GGGCCTQCCG TTGCCCGCCA CCCTCACCTT COACTACCCQ ACCCCTCTGG 

17251 CCGTCGCCGC GCACCTGCGC GCGGAGCTCA CCGGTACQCC CGCCCCGGCC ' 

17301 GGCTCCGCGC CCGCCACGGG CQCCCTCQGC GCGGGTGACC TCGGCACGGA 

17351 CGAGGACCCG GTCGCCATCG TGGCCATGAQ CTGCCQCTAT CCOGGCGGCG 

17401 CAGGCACGCC CGAGGACCTG TGGCGGCTGG TCGCGGACGG CGCCGACGCG 

17451 ATCGOAGACT TCCCCACCGA CCGCGGCTGQ GACCTGGCGC GGCTGTTCCA 

17501 CCCCGACCCC GACCGGTCGG GCACCAGCTG CACGCGGCAG GGCGGATTCC 

17551 TGTACGACGC CGCCGACTTC GACGCCGAGT TGTTCGACAT CAGCCCQCX3C 

17601 GAGGCCCTGG CCGTCGACCC GCAQCAGCGG CTGCTCCTCG AGTGCGCCTG 

17651 GGAGGCCTTC GAACGGGCGG 6CCTGGACCC GCGGGCGCTC AAGGGCAGCC 

17701 CCACCGGCX3T GTTCGTCGGC ATQACGGGGC AGGACTAC3QG CCCCCGTCTG 

17751 CACGAGCCGT CCCAGGCCAC CGACGGCTAT CTGCTGACCG GCAGCACGCC 

17801. GAGCGTGQCC TCGGGCCGCC TGTCGTTCAQ CTTCGGCCTT GAGGGGCCCG 

17851 CCCTGACGGT GGACACGGCC TGCTCGTCGT CGCTGGTCAC GCTCCATCTC 

17901 GCGGCGCAGG COCTGCGOCQ CQGCXJftGTQC GACCTGGCCC TCGCCGGCGG 

17951 CGCCACCGTC CTGGCCACGC CGGGCATGTT CACCGAGTTC TCX3CX3QCAGC- 

18001 GGGGCCTGGC CCCCGACGGC GGCTGCAAGC CGTTGGCGGC GGGCGCCGAC 

18051 tSGCAOGGGCT GQOCCGAGGG GGTGGGCCTG GTCCTCCTCG AAAGGCTCTC 

18101 CGAGGCCCGG CGCAAGGGGC ACGCCGTCCT GGCGGTGATG CGGGGTTCGG 
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18151 CGATCAACCA GGACGGCGCG 

18201 TCGCAGCAAC GCGTCATCCG 

18251 GGACGAGGTC QACGTA6TGG 

18301 ACCCGATCGA GGCGCAGGCC 

18351 GCGGAGCGGC CGTTQTGGCT 

18401 GCAGGCCGCC GCGGGTGTCG 

18451 GCCACGACCT GCTCCCCGCC 

18501 GTGGACTGGT CCACCGGCGC 

18551 GCCGCGCGGC GAACGTCCGC 

18601 CCGGCACGAA CX3CGCACCTG 

18651 GTTGCGGGAG CCGCCGACGA 

18701 GTGGGTGQTT TCCGGACGGA 

18751 GTTTGCGTGA GTTGGTGACC 

18801 GTGGGCCGGT CGCTGGTCAC 

18851 GGTCGTGGGC CGCGACCGGG 

18901 CGGCGGGTGA CGCGTCGCCG 

18951 GGCCCC3GGCC CGGT6CTGGT 

19001 CATGGGAGCC CAACTCCTTG 

19051 ACGCGTGCGA QCAGGCGCTG 

19101 GTCCTGCGCG GGGACGGGCG 

, 19151 CGTGCTGTGG GCGGTGATGG 

19201 GCGTCACCCC GGCOGCCGTC 

19251 GTGGTCGTCG CCGGCGCGCT 

19301 CCTGCGCaaC CGGGCGCTGC 

19351 CCCTGGQGGT GGGCCAGGAA 

19401 <3GAGTGGGCA TGGCGGCCGT 



AGCAACGGCC TGACCGCGCC CAACGGCCCC 
TGCCGCGCTC GCGGCCGCCC GGCTCACCGC 
AGGCGCACGG CACCGGCACC ACGCTCGGCG 
CT6CTCGCCA CGTACGGCCA AGGGCGTTCG 
CGGOTCGGTG AAGTCGAACA TCGGTCACAC* 
CGGGCGTCAT CAAGATGGTG ATGGCGA'fcc 
ACCCTQCACG TCGACGAGCC GAGTGGCCAC 
GGTGCGACTG CTCACCGAGC CGGTCGTCTG 
GCCGCGCGGC GGTGTCGTCC TTCGGCATCT 
GTGCTCGAAG AGGCGGGGCA GGACGAGTAC 
CGCCGGGCCG GTGGACGGTG CTGTGCTGCC 
CCGGASCGGC GCTGCGCGAA CAGGCCCGCC 
GGCGGCTCGG CCGATGTCTC TGTGTCCGGG 
CACGCGGGCG GTGTTCGAGC ACCGGGCCGT 
ACACGCTGAT CGGCGGCCTC. GAGGCCCTTG 
GACGTCGTGT GCGGGGTCGC GGGCGATGTC 
GTTCCCCGGG CAGGGCTCQC AGTGGGTGGG 
GCGAGTCCGC GGTGTTCGCG GCGCGGATCG - 
TCCCCGTACG TCGACTGGTC ACTGACAGAG 
CGAACTGTCG CGCGTCGACG TCGTCCAOCC. 
TCTCGCTCGC CGCCGTCTGG GCGGACCACG 
GTOGGGCACT CCCAGGGAGA GATCGCCGCT 
CACCCTGGAG GACGGCGCCA AGATCGTGGC 
GTCAGCTCTC GGGCGGGGGC GCGATGGCCT 
CAGGCAGCCG AACTCGTGGA OGGCCACCCC 
CAACGGCCCG TCATCGACCG TCATTTCAGG 
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19451 CCCGCCCGAG CftAGTCGCCG CCGTCGTCGC CGACGCCGAG GCGCGCGAGC 

19501 TGMAGGCCG CGTCATTGAC QTGGACTACG CCTCGCACAG CCCCCAGGTC 

19551 GACGCCATCA CCGACGAACT CACCCACACC CTGTCCGGCG TCCGCCCCAC 

19601 CACGGCCCCG GTGGCGTTCT ACTCGGCCGT GACCGGAACC CGCATCGACA 

19651 CGGCGGGCCT CGACACCGAC TACTGGGTCA CCAACCTGCG CCGCCCGGTC 

^ ■ 

19701 CGGTTCGCCG ACGCCGTCAC CGCGCTCCTC GCCGACGGCC ACCGGGTCTT 

19751 CATCGAGGCC AQCAQCCACC CCGTCCTCAC CCTCGGCCTC CAGGAGACCT 

19801 TCGAGGAGGC CGGGGTCGAC GCCGTCACCG TCCCCACCCT GCGGCGCGAG 

19851 GACGGCGGCC GGGCACGCCT GQCCCGCTCG CTGGCACAGG CCTTCGGCGC 

19901 CGGGTGCGCG GTGAGGTGGG AGAACTGGTT TCCGGCCACC GGTACGTCCA 

19951 CCGTGGAQCT GCCQACGTAC GCCTTCCAGC 6TCGCCGTTA CTGGCTGGAG 

20001 GCCCCCACGG GCACCCAGGA CGCGGCGGGC CTGGQCCTCG CCGCTQOGGG 

20051 GCACCCGCTC CTCGGGGCGQ CCACCGAGAT CGCGGACGGC GACATCCGCC 

20101 TGCTCACCGG CCGTATCAGC AGGCACAGCC ACCCCTGGCT CGCTCAGCAC 

20151 ACCCTCTTCG GTGCCGCGGT CGTGCCCGCC TCCGTCCTCG CGGAATGGGC 

20201 GCTGCGCGCC GCCGACGAGG CCGGCTGCCC GCGTGTCGAC GACCTCACGC 

20251 TGCGCACCCC GCTGGTGCTG CCCGAGACCG CGGGCGTGCA GGTGCAGATC 

20301 GTGGTCGGCC CGGCCGAC6C GCGGGACGGG CACCGCGACT TCCACGTCTA 

20351 CGCCCGCCCC GACGGCAftOG ACGCCTCTGA <3GGCGAQaGC ATCGCCGAGG 

20401 GCGAGGGTGC CTCTGAGGGC GAGGGTGCCT CCGGCGGCAC CGATGCGCCG 

20451 TGGACCTGCC ATGCCGACGG CCOACTGOTC GCCGAGCCCA CCGGCACGGC 

20501 CTCGGAGGAC TCCCCGGACA CGGTGTGGCC GCCGCCCGGC GCCGAACCCG 

20551 TCGACCTOGG CGACTTCTAC GAGGGGGCCG CCGCCACCGG AGTCGGGTAT 

20601 GGACCGGTCT • TCAOGGGGCT GCOCQOCCTO TGGCG6GGGG ACGGCGAGCT 

20651 <3TTC5GCCGAG GCGGTGCTGC CGCAAGAAGC CCCGGAAACC GCCGGGTTCG 

2-0701 -GCATGCACCC <3GCQCTCCTC <3ACGCCGCAC TGCACCCCGC ACTCCTCGGC 
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20751 GAGCGGCCGG CCGAGGAGGA CAAGQTGTGG CTGCCGTTCA CGCTGACCGG 

20801 AGTGACCCTG TGGGCCACCG GTGCCACCTC TGTACGCGTC CGTCTCACCC 

20851 CGCTGGACGA CGACCCCGAC GCGTCGGCGG ACGGGCGGGC CTGGCGGGTC 

20901 GGCGTGAGCG ACCCGACCGG C6CGGAGGTG CTGACCTGCG AGGCCCTGGT 

20951 CGCGGTGGCG GCGGGCCGCC GCGAGCTGCG GGCCGCGGGG GAGCGGGTGT 

21001 CCGATCTGTA CGCX3GTGGAG TGGGTGCCGG TGCCGGGCCC GGGGCCGcJtG 

21051 GGTGAGGGTG CTGACTTCTC GGGCTGGGCC GGTCTGGGGG AGTOCGGGQA 

21101 GCGTTGGQAG TGCGTGGGGC GCGTGGAGCG CTGGTACGAG GACCTGGACG 

21151 CTCTCGGCGC GGCTGTCGAG GGTGGGGCTT CGGTGCCCTC TGTCGTTCTC 

21201 GCCACCQCGG CTGCCGCCCC TQGTGGAGCG GGCGACGGAG CCGCCGATGC 

21251 GCTGAGCGCG GTGCGGTGGA CCGGCGCGCT CCTCGATCAG TGGCTCGGCQ 

21301 ACGCX3CGGTT CQCCX3ACGCC CGQCTGGTGQ TOATCAOGTC CGGCGCGGTC 

21351 GCCACGGGTG ACGATTTCCT TCCCGACCCG GCCGCCGCGG CGGTACGAGG 

21401 ACTOGTCGAQ CAGGCXSCAGG TCAGGGACCC CGGCCGCATC CTCCTCGTCG 

21451 ACACGGAAGC CGGGGCCGGG CTCGGGGTCG GCGCCGGAGT GGATGACGCG 

21501 CTCCTGGAAC AGGCCGTGGC CATGGCTCTC GGCGCCGACG AACCGCAACT - 

21551 CGCCCTOCGC 0C060GCGQG TCCTGOCGCC CCOCCTCACC GCACCCCAGG 

21601 ATGCGGCCGT GACX:GAAGCG GCGCGACCGC TCGACCCGGA CGGCACCGTA 

21651 CTCATCACAG GGCCGQCCX3G TGCTCCGGTG GCGGACCTCQ CCGAACACCT 

21701 CGTACGCACC GGGCAGTGCA GGCATCTGCT GCTCCTGCCT GGAGACGGTG* 

21751. AACTGGAGGA AATGGCC6AG GAGTTGCGGG GCCTGGGCGC CACGGTGGAC 

21801 CTGAGTACCG CCGACCCGGC GGACCCGACC GCCCTCGCCG AAGTGGTCGC 

21851 CGCCGTCGAG GGGGACCATC CTCTTACXSGG OGTCATCCAC GCCACCGGAG 

21901 TCGTGGAGGC <3TTCGATCCC GGCGACTCGG CGAGCGACTT GATGATCGAC 

> < 

21951 TCGGGGAGCG ATTCGTTCGC CGAGGCATGG TCGTCGAGGG CGGGCOTCAC 

22001 CGCCGCACTG CACACOGCOA GCX3CCCACX:T TCCCCTOGAC CTGTTOGCCG 
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22051 TCCTGTCCCC GGCGGGCGCG GACCTGGGCA TTGCCCGGTC GGCGGCCGCC 

22101 GCGGGCGCCG ACQCCTTCAG CGCGGCACTC GCCCTGCGCC GGCACACX3AC 

22151 CGTCACGACG GACACGACAG CCCCGCCGCG CACGACAGCC CCGCCGCGAA 

22201 CGACAGCCTC GCCGCGCACG ACAGCCCTGX CGTGGTCGCG CACGACGGGC 

22251 GTGGCCCTCG CCTACGGGCC GCCCACCGCG CCGAGGCCCG GCATCAAGGQ 

22301 GACGGCGCCC GGTCGGATCC CCGTGCTGCT CGACGCCGCT CGCGCTCAtG 

22351 OOGGCGGTTC GCCCCTGCTC GGGGCCCGCT TGGCCGCGCG TGCCCTGGCC • ' 

22401 GCCGAGTCCG CCGCCGAGGG CGTCGCCGGC CTGCCCGCGC CGCTGCGCGC 

22451 GCTGGCAGTG GCCGCAGCCG OGGCCGGAGC ACCGACCCGG CfGCACCGCCG . 

22501 CCGACCGCAA GCCCCCCGCG GACTGGCCGG CCCGACTGGC CCCCCTGTCC 

m 

22551 GCCCCCGAAC AACTCCGTCT GCTCATCGAC GCCGTACGCA CCCACQCCGC 

22601 CGCGGTCCTC GGCCGCACCG ACCCGGAAGC GCTGCGCGGG GACGCCACCT 

22651 TCAAGCAGCT*. CGGCCTTGAC TCGCTGACCQ CCGTGGAGCT OCGCAACCGG 

22701 CTCGTGGAGG AGACCGGTCT GCGGCTGCCC ACCGCCCTCG TCTTTCGCTA 

22751 eCCGACCCCC GCGGCGATCG CCGCGCACCT CCGCGAGCGG CTGACCAGCC • 

22801 OGAGCGAQAC GACCGCCACA CAGAGGTCCG GAGGGCAGAC GCCCGCAGCG 

22851 GGGCAGGCGT CX3TCCGCGCT CGCCCCCGGC GGATCGGCCG CCGGACCGCC 

22901 CX3CCGCAGAC ACOGTGCTGA GCGACCTGAC CGGCATGGAG AACACCCTCT 

22951 CCGTGCTCGC CGCCCAGCTG CCCCAGACCQ AGACGGGTGA GATCACCACC 

23001 CGGCTCGAAG CGCTCCTCAC GCGCTGGAAG ACCACGAAGG CCACGGCGAA 

23051* CGACAGCGGC GACGGCAACG GCGGCQATGA CGACGCCGCC GAACGCCTCA 

23101 AGGCCGCGTC CGCCGACCAG ATCTTCGACT TCAtCGACAA CGAGCTTGGT 

23151 GTCGGGCACG ^GCACgTCGCG <::GTGACCCCC ACXCCGAAGG CCGGGTGACC 

23201 GCACATCGCG agtgaagagc aactggtcga atatctgcgc agggtoacca 

23251 gcgagctcca tgacac<3cgt <k3gggcctgg tgcaggagga ggaccgcagg 

23301 <!aggaagogg togccctggt oqgcatggcc tgccgcttcc cqggcggcgt 
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23351 GGCCTCACCG GAGGACCTCT GGGACCTGGT CGCCGCGGGC AAGGACGCCA 

23401 TCGAGGACTT TCCCACCGAC CQQGGCTGGG ACCTGGAGGC GCTCTACGAC . 

•23451 CCGGACCCGG CCGCGTACGG GACCAGCTAT GTCCGCCACG GCGGGTTCGT 

23501 GGACGACOCG GGCTCCTTCG ACGCCGACTT CTTCGGCATC ASCCCGCGAG 

23551 AAGCCCTGGC GATGGACCCG CAGCAQGGGC TGATGCTGGA GACGTCCTGG 

23601 GAGCTGTTCG AGCGCGCCGG CATCGAACCC GTCTCCCTCA AGGGCAGCfcG 

23651 TACGGGC?OTC TACGCCGGGG TGTCCAOCGA GGACTACATQ TCCCAACTGC 

23701 CCCGCATCCC CGAGGGGTTC GAGGGGCACG CCACCACCGG CAGCCTCACC 

23751 AGCGTCATCT CQGGCCGGGT CX3CaTACAAC TACGGCCTCG AAGGCCCGQC 

23801 CGTCACCGTC GACACAGCCT GTTCCGCCTC GCTCGTCGCC ATCCACCTQG 

23851 CGAGCCAGGC GCTGCGCCAG CGTGAGTGCG ACCTCGCCCT CGCGGGCGGT 

23-901 GTGCTCGTAC TGTCCAGCCC GCTCATGTTC ACCGAGTTCT GCCGCCAGCXS 

23951 GGGCCTTGCT CCCQACGGCC .GCTOCAAGCC GTTCGCCGCC GCGGCGQACG 

24001 GCACCGGCTT CTCGGAGGGC ATCGGTCTGC.TCCTCCTGGA GCGCCTGTCC 

24051 GACGCGCGCC GCAACGGCCA CAAGGTGCTC . GCGGTGATCC GCGGCTCCGC • 

24101 • CGTCAACCAG QACGGCGCGA GCAAGGGCCT GACCGCCCCC AACGACGCCG 

24151 CGCAGGAACA GGTGATCCGC GCCGCCCTCG ACAACGCCCO CCTCACCCOG ' 

24201 TCCGAG^TGG ACGCCGTCGA GGCGCACGGC ACCGGCACCA AACTGGGCGA 

24251 CCCCATCGAG GQCGGAGCGC TGCTCGCCAC CTACGGGCAA CACCGCGCCC. 

24301 GGGCCCTCCT CCTCGGCTCC CTCAAGTCCA ACATCGGCCA CACCCACGCC 

24351 .ACCGCGGQCG TCQCCGGTGT CATCAAGACX: GTdATQGCGA' TCCGCAACGG' 

24401 TCTGCTCCCC GCCACCCTCC ACGTCGAGGA ACTGAGCCCG CACGTCX3ACT 

24451 • GGGACGCGGG- CGCGGTCGAG GTCGTCACGG AGCCCACCCC GTGGCCCGAG 

24501 ACCGGGCACC CCCGGCGCGC OGGCX3TCTCC GOGTTCGGGA TCTCCOGOAC 

245S1 GAATGCX3CAC TTGATCCTGG AGGAGGCGCC GCGGGAGGAG GATGTGCGCG ■ 

24601 CCCCOGTGGT TGTGGAGTCG GQOGGGGTCG TTCCGTGGGT OGTGTCGGGG 
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24651 CGGACGCCGG AGGCGCTGCG TGAACAGGCC CGGCGACTCG GCGAGTTCGT 

24701 GGCAGGCGAC ACGGACGCAC TGCCGAACGA GGTCGGCTG6 TCCTTGGCCA 

24751 CGACCCGGTC GGTGTTCGAG CACCGGGCTG TGGTCGTGGG GCGTGACCGG 

24801 GATGCGTTGA CGGCTGGCCT GGGGGCGTTG GCTGCGGGTG AGGCTTCGGC 

24851 GGGTGTGGTQ GCCGGGQTGG - CCGGTGATGT GGGTCCTGGG CCGGTGTTGQ 

24901 TGTTTCCGGG GCAGGGGGCG CAGTGGGTGG GCATGGGTGC CCAGCTGl'rG 

24951 GACGAGTCTG CGGTGTTCGC GGCGCGGATC GCGGAGTGTG- AGCGGGCCCT 

25001 GTCGGCGCAT GTGGACTGGT CGCTGAGTGC GGTGTTGCGC GGGGACGGGA 

25051 GTGAQCTGTC CCGGGTGGAA GTGGTGCAOC CGGTGCTGTG GGCGGTOATG 

25101 QTCTCGCTGG CTGCGGTGTG GGCGGATTAC GGGGTCACTC CGGCTGCCGT 

25151 GATCGGGCAC TCGCAGGGTG AGATGGCTGC CGC6TGTGTG QGGGGQGCGC 

25201 TGTCGCTGGA GGATGCGGCG CGGATCGTAG CGGTACGCAG TGACGGGCTT 

25251 CGTCAGCTGC AAGGGCACGG CGACATGGCC TCGCTCAGCA CCGGTGCCGA 

25301 .QCAGGCCGCT GAGCTGATCG GTGACCGGCC GGGCGTGGTC GTCGCGGCGQ 

25351 TCAATGGGCC GTCGTCTACG GTGATTTCAG GGCCGCCGGA GCATGTGGCA . 

25401 GCCGTGOTCG CGGATGCXSGA GGCACGTGQT CTGCGCGCCC OTGTCATCGA 

25451 CGTCGGCTAT GCCTCGCATG GCCCCCAGAT CGACCAGCTC CACGATCTGC 

25501 TGACCGAACG CCTQGCCGAC ATCCGGCCCA CGAACACGGA CGTGGCCTTC 

25551 TATTCGACGG TGACOGCCGA GCGCCTGACG GACACCACQG CCCTPOACAC.. 

25601 GGATTACTGG GTCACCAACC TCCGTCAGCC CGTCCGGTTC GCCGACACCA 

25651 TCGAAGCCCT TCTCGCGGAC GGCTACCGCC TGTJT.CATCGA GGCCAGCGCC " 

25701 CACCCCGTGC TGGGCCTGGG CATGGAGGAG ACCATCGAGC AGGCGGACAT 

25751 GCCCGCCACC GTCGTCCCCA GCCTCCGCCG CGACCACGGC GACACCACCC 

25801 AGCTCACCC3G CGCCGCCGCC CACGCCTTCA CCGCGGGCQC CGATGTCGAC 

2S851 TGGCGGCGCT GGTTCCCGGC CGACCCCGCC CCCCGCACGA T^CGATCTCCC 

25901 CACCTACGCC TTCCAGCGCC . GCCGCTACTQ GCTGQCX3GAC ACAQTGAAQC 
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25951 GGGACAGCGG ATGGGACCCG GCCGGGTCGG GGCATGCCCA GTTGCCOACC 

26001 GCGGTCGCCC TCGCCGACGG GQGAGTGGTG* CTGAACGQCC GGGTQTCCGC 

26051 CGAGCGCGGT GGCTGGCTGG GCGGGCATGT GGTGGCGGGG ACGGTTCTGG 

26101 TGCCGGGTGC GGCGTTGGTG GAGTGGGTGT TGCGGGCCGG TGATGAGGCG 

■ 26151 GGTTGCCCCT CGCTTGAGGA GTTGACGCTC CAGGCGCCGT TGGTGTTGCC 

26201 CGAGTCGGGT GGGTTGCAGG TTCAGQTGGT CGTGGGTGCG GCTGATQjifeC 

26251 AGGQCGGGCG TCGTGACGTA CATGTGTATT CGAGGTCTGA GCAGGACGCG 

26301 TCGGCGGTGT GGCAGTGCCA TGCCGTCGGT GAGCTCGGGC GCGCGTCGGT 

26351 QGCGCGGCCQ GTGCGGCAGG CCGGGCAGTG GCCTCCGGCG GGGGCCGAQC ' 

26401 CGGTGGAGGT GGGCGGCTTC TACGAGGGGG TCGCGGCCGC CGGTTACGAG 

26451 TACGGTCCGG CGTTCCGTGG GCTGCGCGCG ATGTGGCGGC ACG6TGATGA 

26501 CCTCCTTGCG GAGGTCGAGC TGCCGGAGGA GGCCGGTTCG CCGGCC6GTT 

26551 TCX3GCATCCA CCCGGCGCTG CTGGACGCCG CCCTGCACCC GCTGCTCGCA 

26601 CAGCGGAGCC GGG^CGGGGC CGGGGCGGGG GCCCACGGCG GGCAGGTGCT 

26651 GCTGCCTTTC AGCTGGAGCG GTGTTTCCCT GTGGGCCAGC GAGGCCACCA 

26701 CTGTGCGGGT GCGGGTCACC GGGCTGGGAG GAGGQGACGA CGAGACGGTG 

26751 TCCCTGACGG TAACCGACCC CGCCGGTGGC CCCGTGGTGG ACGTGGCAGA 

26801 GCTGCGGTTG CGGTCGACQA GCGCCCGGCA GGTGCGGGGT TCGGCAGGCC 

26851 CCGGCGCGGA CGGGCTCTAC GAGCTGCGGT GGACACCGTT GCCCGAGCCG 

. 26901 CTTCCCGTAC CGGCCCCCGC GAACGGTCGC GATGTGGCCG CCGACCTGTC 

26951 CGQATGCGCQ GTGCTCGGCQ AACTGQTCGC QGAACCGQGC CCGGGCATCG 

2700X ACCTGGAGGG CTGCCCCTGC TACCCGGGCG TCGGCGCGCT CGCCGACAAC 

27051 GCCTCCCCGC CCTCCA3X5AT CCTCGCCCCC GTQCACAGCG ACACCACAOG 

27101 CGGCGACGGA CTCGCCCTGA CGGAACGGGT GTTGCGCGTC ATCCAGGACT 

27151 TCCTGGCTGC ACCGAGTCTG GAACAGAAAC AGACGOXICT GGCX:TTCGTG 

.27201 ACC!CGGGGCG GGGCGGACAC AGGTAGCACG ACGGGAGGCT CX3GCTGCGCC 
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27251 QGCAGAGGCA QTC6ACCCGG CGGTCOCQGC CGTATGGG6C CTAGTACGCA 
27301 . GCGCGCAGTC GGAGAACCCC GGCCGCTTCG TACTGCTGGA CACCGACGCG 
27351 CCCCTCGACC AGGCGTCCGT TGCCCCTCTC GTGGACGCGG TGCGGTCTGC 
27401 CGTGGAGGCG GACGAGCCCC • AAGTCGCCCT GCGCGGGGGA CGGTTGCTCG 

* 

27451 TGCCCAGGTG GGCGCGGGCC GGCGAGCCCG TCGAGCTGGC CGGGCCGGCC 

27501 GGAGCGCGGG CGTGGCGGCT GGTGGGCGGA GACTCCGGGA CGCTGGAd&C 

27551 CGTCGTGGCG GAGGCTTGCG ACGACATTGT GCTGCGCCCG TTGGCGCCGG 

27601 GCCAGGTCCG CGTCGCCGTC CATAOGGCCG GGGTCAATTT CCGTGACGTC 

27651 CTGATCGCCC TGGGCATGTA CCCGGACCCG GACQCGCTGC CCGGCACCGA 

27701 GGGGGCCGGC GTGGTGACGG . AGGTCGGGCC GGGCGTCACC CGTCTGTCGG 
27751- TGGGCGACCG CGTGATGGGC ATGATGGACG GCGCCTTCGG CCCX3TGGGCC 

27801 GTCGCCGACG CGCGCATGCT GGCCCCGGTC CCGCCCQGCT QGGGCACCCG 

27851 GCAGGCGGCC GCCGCTCCXIG CCGCQTTCCT GACGGCTTGG TACGGGCTGG 

27901 TGGAGCTGGC CGGTCTGAAG OCGGGCGAGC G1X3TGTTGAT CCATGCCGCC 

27951 ACGQGTGGTG TGGGGATGQC GGCGGTGCAG ATCGCCCGGC ATGTGGGTGC 

28001 CGAGGTGTTC GCCACCGCGA GTCCGGGCAA GCACGGCGTG CTGGAGGAGA 

28051 TGGGCATCGA CGCCGCCCAC CGCGCCTCGT CGCGCGACCT CGCCTTCGAG 

28101 GACGCCTTCC .GGCAGGCCAC CGACGGCCQT GGCGTQGACQ TCGTCCTPAA 

28151 CAGCCTCACC GGTGAACTGC TCGACGCGTC CCTGCGATTG CTCGGCGACG 

28201 GCGGGCGCTT CGTGGAGATG GGCA?iGAGCG ATCCGCGCGA CCCCGAGCTG • 

28251 GTCGCGCTGG AGCACCCCGG GGTGTCGTAC GAGGCCTT.CG' ACGTCGTCGC 
28301 . CGACGCCGGG CCGGAGCGGC TCGGGCTGAT GCTCGACAGG CTCGGCGAGC . 

28351 .TCTTCGCCGG CGGATCACTG QTACCGCTGC CQGTCACCQC ATGGGCGCTG , 
284Q1 . GGGCGGGCGC GAGAGGCGCT CCQCCACATG AGTCAGGCGA GGCACACCGG 

28451 CAAGCTGGTG CTCGAOGTGC CCGGGCOGCT CGACCCCGAC GGCACCGTCC 

28501 TCGTCACOGG GGGTACCGGC ACCATCGGCG CGGCCGTGGC CGAACACCTG 
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28551 GCGCGTACCG GGQAGAGCAA GCACCTGCTC ATCGTCAGCC GCAGGGQGCC 

28601 GGCCX3CCCAC GGCGCCGAGG AACTTGTCTC TCGTATAGCC GAGTTCGGGG 

28651 CCG7VAGCCAC CTTCGTCGCT QCCGACGTGA GTGAGCCCGA CGCGQTCGCC 

28701 GCCCTGATCG AAGGGATCGA TCCGGCCCAT CCGCTGACCG GTGTCGTGCA 

28751 TGCCGCCGGA GTACTCGACA ACGCTCTGAT CGGCTCCCAG ACCACCGAAA 

28801 GCCTCACCCG . CGTATGGGCG GCGAAGGCCG CCGCCGCGCA GCAACTCcJkc 

2.885.1 GAGGCCACGA GGGAGTCGAG GCTGGGACTG TTCGTGATGT TCTCCTCCTT 

28901 OGCCTCCACC ATGGGCACCC CAGGGCAGGC CAACTACTCC GCCGCCAACG 

28951 CCTATTGCGA CGCGCTGGCC .GCTCTCCGAC. GCX3CGGAGGG GCTCGCCGGC 

29001 CTGTCCXJTGQ CGTGGGGGTT GTGQGAGGCC ACCAGCGGCC TGACCGGGAC 

29051 GTTGTCGGCG GCCGACCGGG CCCGCATCGA CCGGTACGGC ATCAGGCCGA 

29101 CCAQCGCGGC ACGCGGCTGC GCCCTGCTGG CAGCGGCACG CGCCCACGGG 

29151 CGCCCCGACC TGCTCGCCAT GGACCTGGAC GCCCGCGTAC CCGCCGCGTC 

29201 CGACGCTCCG GTCCCCGCCG TGCTGCGCAC TCTGGCGGCC. GCCGGAGCGC 

29251 CCGCCACCGC CCGTCCCACC QCG0CX3GCGG CCGCTGACGG GGCQACGGAC 

29301 TGGTCCGGCA GGCTCGCCGG CCTCACCGAG GAGGCACGGC TCGAACTCCT 

29351 CACCGAGTTG GTGTGCACCC ACGCGGCAGG GGTGCTCGGG CACGCCGACG 

29401 CGGGCGCGGT CGAGGTGGAC GCGCCGTTCA AGGAACTCGG CTTCGACTCG 

29451 CTGACCGCCG TCGAACTGCG CAACCGGATC GCCGCCGCGA CCGGCCTGAA 

29501 ACTGCCCGCC GOCCTCGTCT ' TCGACTACCC <3CAGGCTCGC GTTCTCGCCG . 

29551 CCCACCTGGC CGAACGGCTC GTCCCGGAGG GCGCGGGGGC CATGGGCGGT 

29601 GTGAGCGGTG OGGAGGGCGT GAGGGACGCG TACGGGGCAG GCGGTCCGGG 

29651 CGGCGACATG • ACCGCCCAGG TCTTGCTGGA QGTGGCCCGC GTCGAGCACA . 

297Q1 CCCTGTOCGC CGCCGTCCCG •CACGGCCTGG ACCGGGCGGC CGTOGCGGCC 

29751 CGCCTGGAGG CGCTGCTCGC CXIGCTGCACG GCGACGACGG CGGCCACGGG 

29801 CGCCGCGQGA GCCGOGGTGG AGGQTGACGG CGACAGCGAC GGCGACGGCG 
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29851 CCGTGGATCA GCTGQAGACXS GCCACCGCCG AQCAAGTACT GGACTTCATC 

29901 GACAACGAAC TCGGGGTGTO AGCCGCQTQC CGGCCGCACA CCAGGCGATC 

29951 ACGGGCGGGG AGCTGCAGCG CACATGGTGA GCGAAGAGAA ACTGGTCGAC 

.30001 TACCTCAAGC GTGTCTCCGC GGACCTGCAC GCCACCCGGC AGCGGCTGCG 

30051 CGAGGCGGAG GAGCGCGGCC AGGAACCCGT GGCCGTGGTG GAGGCCGCCT 

30101 GCCGCTACCC CGGCGOCATC CGCACCCCCG AAGACCTGTG GGACCTGGTC 

30151 GCCGCGGGCG GCAACGCCCT GGGCGCCTTC CCCGACAACC GCGGCTGGGA 

30201 CCTGCQACGC CTCTTCCACC CCGACCCCGA CCACCGCQGG ACGACCTACG 

30251 CCCGCGAGGG CGGCTTCCTC CACGACQCCG ACCTGTTCGA CCCGGAGTTC 

30301 TTCGGCATCA GCCCCCGCGA GGCCGCGGTC CTCGACCCGC AGCAGCGACT 

30351 GCTCCTGGAG TGCOCCTGGG AGGCACTGGA GCGCGCGGGC ATCGACCCGC 

30401 GGTCCCTCCA GGGCAGCCGT ACCGGCGTGT ACGCGGGTGC CGCCCTGCCC 

30451 GGCTTCQGCA GCCC6CACAT CGACCCCGCC GCCQAGGGCC ACCTGGTCAC • 

30501 CGGCAGCGCC CCQAGCGTCC TCTCGGGCCG GCTCGCCTAC ACCTTCGGCC 

30551 TCGAAGGGCC CGCGGTGACG ATCGACACCG CCTGCTCGTC GTCGCTCGTC. 

30601 GCCGTGCACC TGGCGGCCCA CGCGCTGCGG CAGCGCGAGT GCGATCTGGC 

30651 GCTCGCGGGC GGTGTCACCG TCATGACCAC CCCGTACGTG TTCACCGAGT • 

30701 TCTCGCGCCA GCGCGGCCTG GCCGCCGACG GCCGGTGCAA GCCCTTCGCG 

30751 GCCGCCGCGG ACGGCACGGC CTTCTCCX3AG GGCGCGGGAC TCCTCGTACT 

30801 GGAACGCGTC TCCGACGCCC GCCGQGCCGG CCACCGGGTG CTGGCCGTCA 

30851 TCCGCGGCTC GGCCGTCAAC CAGGATGGCG CGAGCAACGG CCTCACCGCC 

30901 CCCAACGGCC CCGCCCAGCA GCGCGTGATC CGCGCCGCCC TCGCCGGGGC 

30951 GC<3GCTCTCG CCCGCGGAGG TGGACGCGGT OGAGGCGCAC GGCACOGGCA 
31001' CCCGGCTGGO CGACCCCATC <3AGOCCGACG CGCTCCTCGC CACCTACGGT 

310S1 CA<3GAfiGGCC ACGGGGGOOG X3CCGCTCTOG CTGGGCTOGG TGAAATQCAA - 

. 31101 CATCGGCCAC ACGCAGGGCG CGGCCGGTGC CGCGGGCCtG ATCAAGATGG 
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31151 TCCAGOCACT GCGGCACGAG ACGCTGCCCQ CCACGTTGTA CGCCQACGAG 

31201 CCCACCCCGC ACGCCGACTG GGAGTCGGGC GCGGTGCGCC TGCTCAGCGC ■ 

31251 GCCGGTCGCC TGGCCGCGCG .GGGAGCACGG GGAGCACACC CGCAGGGCCG • • 

31301 GCATCTCCTC. COTCGGCATC TCCX3GCACGA ACGCCCACCT CATCCTGQAX3 

31351 GAGGCGCCCG CGGCCGACGC CGAAGGAQCG GGTGGCGACG GCGATGGCGA 

31401 CQGGGGAfiGQ GTGCGGCCGG TG0TGC20G0T CGGCGCCACG GGCCCCCGCG 

31451 AAGAGCAGGG CCAAGGACAG GGCCAAGAGC AGCACCAACA GCAACGTCAG • 

31501 CAGCGGCAGC GGTCGTCGAT GATGCCGACG CCGCACCTCC CGTGGCTGCT 

31551 GTCCGCCCGC AGCCCCGCCG CGCTCCGCGC CCAGGCCGAC GCGCTGGCGA 

31601 ACCATGTCGC CCACGCGQAC CACTCCATCG CCGACATCGG CGGCACACTG 

31651 CTGCGCCGCA CCCTGTTCGA GCACCGGGCG GTCQTCCTCG GAACC3GACCG 

31701 TGATGAGCGT GCCGCAGCGC TTGCCGCCCT CGCGGCAGGA CGCGCACACC 

31751 CCGCGCTGAC CCGGGCCGCA- GGGCCGGCGA GOAACGGCGG CACCGCCTTC 

31801 CTGTTCACCG GCCAGGGAAG' CCAACGCCCA GGGATGGGCA GGCAGTTGTA 

31851 OOACACCTTC OACGTCTTCO CCOAGTCGCT COACGAGACC TGCGCCCGGC 

31901 TCGACCCCCT GCTCGAACAG CCGCTQAAQC CCGTCCTGTT CGCCCCCQCC 

» 31951 GACACCGCGC AGGCCGCCGT GCTGCACGGG' ACCGGCATGA CGCAGGCCGC . 

32001 GCTGTTCGCC CTCGAAGTCG CCCTGTACCG CCAGGTCACC TCCTTCGGGA 

32051 TCGCCCCCAG CCACCTGACC GGGCACTCCG TCGGCGAGAT CGCCGCCGCC 

32101 CACGTCGCCG QGGTQTTCTC CCTGGCGGAC GCCTGCACX3C . TGGTCGCGGC 

32151 CCGGGGCCGC CTCATGCAGG CCCTGGCCGC AGGTGGCX3CC ATGCTCGCCG 

. 32201 TCCAGGCGGC CX3AGGACGAC GTACTGCCGC TGCTCGCCGG GCAQGAGGAA 

32251 CGTCTCTCCC TCGCCGCCGT . CAAOQGCCCC ACCGCCGTCG TCGTOTCGGG- 

32301 TGAGGCCGCT GCCGTCGGGG AGGTGGAGAA GGCGCTGCGC GGGCGCGGAC 

32351 ■ TGAAGACCAA GCGQCTCAAC GTCAGTCACG. CCT.TCCACESC- GCCGCTCATC 

32401 GAGCOGATGC TCGACGACTT CGGCX3AAGTG GCCGGCGGGC TGACCTTCCA 

-25- 



SUBSTITUTE SHEET (RULE 26) 



wo 01/68867 



PCT/GBOO/02072 



32451 


CGCGCCGACG 


CTGCCCGTCG TCTCCAACCT 


CACCGGCCGC CTCGCCGACG 


32501 


CGGAGCTGAT 


GGCCGACGCC GAGTACTGGG 


TGCGGCACGT ACGCCGGCCG 


32551 


GTGCGGTTCC 


ACGACGGGCT GCGCGCTCTC 


AGCGAGCAAG GCGTCGTGCG 


32601 


CTACCTGGAG 


TTGGGGCCCG ACCCGGTCCT 


CGCCACCATG GTCCAGGACG 


32651 


GTCTCCCGGC 


CCCGGCGGAG GGAGAGGAGC 


CCGAGCCGGT CGTCGCCGCG 


. 32701 


GCGCTGCGCT 


CCAAGCACGA C2GAGGGACGC 


ACCCTGCTGG GTGCCGTCGC 


32751 


CGCGCTCCAC 


ACCGACGGAC AGCCGGCCGA 


CCTCACCGCC CTCTTCCCCG 


32801 


CCQACGCCGG 


GCAAGTGCCQ CTCCCCACCT 


ACCGGTTCCA GCGGCGACGG 


.32851 


TACTGGCGCG TCGCGCCCGA CGCCGCCGCG CCGGCCCGGG CCGCCGGCCT 


32901 


CCAGGAGACC 


GGCCACCXIGC TGCTGCCCGC 


CGTCATCCGG CAGGCCGACG 


32951 


GCGGCATCCT 


GCTCGCGGGA CGCCTGTCCC 


TGCGTACGCA TCCATGGCTC 


33001 


GCCGACCACA 


CCATCGCGGG CGGCGTCCCG 


CTGCCCGCCA CCGCCTTCGT 


33051 


CGAACTCGdC 


CTGCTCGCAG GGCGGCACGC 


CGCCTGCGAC ACGATCGACG - 


33101 


ATCTGACGCT 


GGAGACGCCG CTGCTGCTCG 


ACGACACCGG TACCGGTGTC 


33151 


GGGGCGGCTG 


TGGGCGCGGG CGCCGATGCC 


CTCGTCGATG CCATAGAAGT 


33201 


GCAGCTTGCC 


CTCGGCGCTC CCGACGGTTC 


CGGCCGCCGT GCXCTCACCG 


33251 


TCCACTCCCG TCCTGCCGAC GATGCGGCTG ACGACGGCGA CGCGGCCGAC 


33301 


GCGGCCGATG 


CGGCAGGCCG GGGAGGCCCG 


GOCGGCTCGG GTGACCTGGG 


33351 


CGATCCTGGC 


GATCCGGGCG ATCTGGGCGA 


CGGCGGGGGC TCCCGCGGCT 


33401 


GGGGCCGTCA 


CGCCACCGGC ATCCTCAGCG 


CCGGCGCGGC CGCCGAACCG 


33451 


GCCGCCCCCG ACGCCGCTCC CTGGCCGCCC GCCGACGCCA CCGCCCTCGA 


.33501 


CGTCQACGCQ 


CTGTACGCCC GGCTCGACGC 


GGAGGGGTAC AGCTACGGGG 


33551 


CCGCCTTCCG 


GGCCGTCCAC GCCGCCTGGC GGCACGGCGA CGACCTCTAC 


33601 


^CGATGTCC GCCTCGCGGA C3GAACAGCGC GCTGAAGCCG ACGCQTTCX3C 


33651 


CCTCCACCCG GCCCTGCTCG ACGGOGCCCT GCATQCGGTC- GACGAGCTGT .* 


33701 


AOCGCGGCAG 


TGAGGGGOGG GGGCAGGA8C 


AGGGGCAGGG TGGTCAGGAG 
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33751 CCGGAGCAGG GCCGTGGCGA CGCGGACGCC CCGGTACGGC TGCCGTTCTC 

33801 CTTCAGCGAC ATACQCCACC ACGCCACCGG GGCCACACGG CTGTGGGTCC 

33851 GC{/ AGCCC CCAGGGCGAC GATCGGCTGC GGCTGTCCCT GACCGACGGC 

33901 GAGGGCGGGC AGGTCQCGAC AGTCGACGCC CTCCAACTGC QGTTGATCCC 

33951 CGCCGACCGG TGGCGCGCGG CCCGCCCCAC CACAGCCGCC CCCCTGTACC 

34001 ACCTGGACTG GCACGAGCTG CCGTTGCCCG AGCCGGCCGA GACGGACCCG 

34051 GCCGCCCACT CCTGGGCTGT GCTCGGAGCG CACGACGCGG GCCTCGCTCC 

34101 CGCCGCGCAC TACCCGGACC TGGCGGCCCT GAAAGCCGCC GTCGAGGCCG 

34151 GCGAGCCCGT GCCGGACATC GTCTTCGCAC CGTTCCCCGC GCAGGGGACG 

34201 GAGACCGATG TCCCGGCTCA GGTACGAGCC CACGCCCGGC ACGCCCTGGA 

ft 

34251 GCTGCTGCGC OACTGGCTCA CCACGOAAOC TTTCQCCQCC GCCCGCCTCG 

34301 TCGTCCTCAC GACCGGTGCX3 GTCACCGCCC GCCCAGAGGA CGGGCCCGCC 

34351 GACCTGGCCA CCGCACCTGT ATGGGGCCTG GTCCGAGCCG CCCAGGCOGA 

. 34401 ACAACCCGAC CATGTCGTCC TGGTGGACAT CGACAAGGAC ATCGATAAGG 

34451 ACACCGACGA OQAGACCGAC CAGGCCACCG ACGCGGGCAC CGCATCGCGC 

34501 CACGCTCTQC CCX3CCQCCTT GQCCGCGGCG GCCX3CCCAAG CCGAGACACA . 

34551 GCTCGCCCTG CGCGCGGGCA CCGTGCTCGT GCCGCGCCTC GCCGTCGTCC 

34601 CGCCCCGGAC C6ACACCCCA GCGCTGCACG CCACCGCCCC GGAGAGCACC 

34651 ACGGACACTG TGGACTCCAC GGGCATCGCG GGCGCTGCGG AATCCGGCGG 

34701 CACCGTCCTG ATCACCGGCG GAACCGGCGG CCTCGGGCAO GCCGTCGCCC 

34751 GTCACCTCGC CGCCGCGCAT GGCGCCCGCC ACCTGCTCCT OGTCAGCCGC 

34801 AGGGGCGACG CCGCCGAGGG CGTCGCCGAG TTGCGCGCCG ACCTCGCGGA 

34851 CGACX3GCGTC GACGTACGCG TCGCCGCCTG C6ACATCACC GACCGCGACG" 

34901 CGCTGGCCGG ^GCTCCTCGCG GACATCCCGG <2CGCGCACCC GCTCACCGCG 

34951 GTCGTGCACA CCGCQGGGGT CATCGACGAC AGCCTCATCA CGGCGATGAC 

35001 CCCCGAGCGG CTCGACGOCG TCCXCGCACC CAAGGCCX3AC GCGGCGTGGC 

-27- 



SUBSTITUTE SHEET (RULE 26) 



wo 01/68867 



PCT/GBOO/02072 



35051 ACCTGCACGA ACTCACCCGC 

35101 TCCTCGGGCG CCTCCGTCCT 
35151 \CGCCAACACC TTCGTCAACA 

35201 TCQCCGCCAC CTCCQTGGCC 

35251 ATGGCCGCCC GGCTCGGCGA 

35301 CGTGACGGGC CTGACCGACG 

35351 TGACCGCCGA GCACCCCACG 

35401 CTGCGCGGCC AGGCCGCOGC 

35451 GGTACGCACT CCGCGCCCCA 

35501 CAGCCACCGG GTCCGCCACG 

35551 CGGCTCGCCC GGCTGTCC3GC 

35601 CATTCGCGAG CAGATCGCGA 

35651 TCQAACTGGG CCGCGCCTTC 

35701 CTGGAACTCC GCAACCGCCT 

.35751 CACCCTCGTC TTCGACCACC 

35801 ACAGCCATCT CCCCGACGAG 

35851 GCCTCTGCGG AQGGCACCGC 

35901 GATCGCCATC GTCGGCATGG 

35951 CCGAGCAGCT GTGGCAGCTC 

36001 TTCCCCGAGQ ACCGCGGCTG 

36051 CGACCAGGTC GGCCACAGCT 

36101 CCGCCCGCTT CGACGCGGGC 

36151 GCCACCGACC CGCAGCAGCG 
.36201/ CGAAC^CGCG 'GGCATCGACC 

36251 TCATCACCGG AATCATGTAC 

3^301 AAACGGGACG GCTTCGAGGG 



GACAAQGACC TGTCGGCCTT CGTCCTCTTC 
CX3GCAACGGC GGCCAGGCCA ACTACGCGGC 
CCCTCGCCGA ACACCGCCGC GCGGCCGGCC 
TQGGGCCTGT QGGAGTCCGC GTCCGGCGGC 
CGCCGACCGC GCCCGCATCC ACCGCACCGG . 
AGCAGGCCCT GGCCCTCTTC GACGCXSGdbc 
GTCCTGGCCA CCCGCTTCGA CCGCGCCGTG 
CCGCACCCTG CAGCCCGCCC TGCGCGGCCT* 
CCGCGTCCGC CGGGGCCATC GGGTCCACCQ 
GACGAGAACG CGCCCTCCTC GTGGGCCGCC 
CGCCGACCGC GACOGCGCCC TCAACGAACT 
CCGTCCTGGC ACACCCCTCA GCCGACACCA * 
CAGGAGTTGG GCTTCGACTC GCTCACCGCC 
CTCCACGGCC ACCGGCATCC GGCTGCCCGC 
CGAGCCCCAC CGCCCTCGTA CGCCATCTCC 
GCCCAGCACA CGTCCCCGAC CGCCCGCGGC 
CGCCACGGCC ACCGGCATCG ACGACGACCC 
CGTGCCGCTA CCCGGGCGGC GTGACCTOGC 
GTGGCCACCG GCACCGACGC CATCQGCCCG 
GGACACGGCC GGACTGTTCG ATCCCGACCC 
ACACCCGCGA AGGCGGCTTC CTCTACGACG 
TTCTTCGGCA TCAGCCCGCG CGAGGCCGCC 
CCTOCTCCTG QAAACCGCCT GGCAGQCGTT 
CCGCCGCCCT GCGCGGCACC CCGTGCGGCG 
OAOGACTACG GATCCCGCTT GCTCGCGCGC . 
CCGCATCATPG ACCGGCAGCA CGCCGAGCGT 
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36351 GGCCTCCGGC CX3GGTCGCGT ACACCTTCGG CCTGGAGGGC CCCX3CCATCA 

36401 CGGTGGACAC CGCGTGCTCC TCCTCGCTGG TCGCGATGCA CCTGGCGGCG 

36451 CAGGCGCTGC GGCAGGGCGA GTGCQAACTG GCCCTGGCCG GGGGTGTGAC 

36501 CGTGATGGCC ACCCCGAACA CCTTCGTGGA GTTCTCCCGC CAGCGCGGCC 

36551 TGGCCCCCGA CGGCCGCTGC AAGCCGTTCG CCGCOGCGGC GGACGGCACC 

36601 GGCTGGGGCG AGGGCGCCGG ACTCGTCGTC CTGGAGCGCC TCTCCGACGC 

36651 GCGCCGCAAG GGACACCGCG TCCTCGCCCT GCTGCGCGGT TCGGCCGTGA 

36701 ACCAGGACGG CGCGAOCAAC GGCATGACC?G CCCCGAACGG TCCCTCGCAG 

36751 GAACGGGXCA TCCGCACCGC CCTGGCCGGC GCGGGCCGTG GTCCCGAGGA 

36801 CATCGACX^TG GTGGAGGCGC ACGGCACCGG CACCAGGCTC GGCGACCCGA 

ft 

36851 TCGAGGCGCA GGCCCTGCTC GCCACGTACG GGCAGGGGCG CdCGGAGGAC 

36901 CGCCCGCTCT GGCTCGGCTC GGTQAAGTCG AACATCGQCC ACACGCAGGC 

36951 CGCCGCCGGT GTCGCGGGCG TCATCAAGAT GGTCATGGCA CTGCGCCACG 

37001 AGCAACTGCC CACGACCCTG CACGCCGACG AGCCGACCCC CCACGTGCAA 

37051 TGGGACQGCG GCGQCQTACG TCTCCTGACC QAACCGGTCC CGTGGTCGCG 

37101- CGGCGAGCGC ACGCGGCGCG CCGGGGTGTC GTCCTTCGGG ATCTCCGGGA • 

37151 CGAACGCGCA CCTGATCCTG GAGGAGCCGC CGGAGGAGGA CCTGCCCGAG 

37201. CGCGTGGCGG CGGAGCCGGG TGGGGTGGTG CCGTGGGTGG TGTCCGGGCG 

37251 GACGCCGGAC GCGTTOCGTQ AACAGGOGCG GGGGCTCGGC GAGTTTGTCG 

.37301 TCGGTGCCGG GGATGTGTCG . GCAGCCX3AGG TQGGATGGTC ACTGGCCACG 

37351 ACGCGGTCGG TGTTCGAGCA GCGGGCCGTG GTGGCGGGCC GGGACCGGGA 

37401 CGATCTGGTT GCCGGOATGC AGGCGCTGGC GGCAGGGGAG ACGCCGACAG . 

37451 ATGTCGTGTC CGGTGCGGCG GCTTCCTCCG GTGCGGGGCC GGTGTTGGTG 

37501 TTCGCGGGGC AGGGGTCQCA GTGGGTGGGC ATGGGTGCCC AGCTCCTTGA 

37551 C<3AGTCCCCC tSTCTTCXSCGG CGCGGATCGC GGAGTGTGAG CAGGCGCTGT 

37601 CGGCGTAGGT <3GACTGGTCG CTGAGTGATG TCCTGGGCGG GGACGGGAGT 
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37651 GAQCTQTCCC GGGTCGAGGT CGTGCAGCCC GTGTTGTGGG CGGTAATGGT 

37701 CTCGCTGGCT GCCGTCTGGG CGQATTAC3GG GGTCACTCCG GCCGCTGTGG 

37751 TGGGGCATTC GCAGGGTGAG ATGGCTGCCG CGTGTGTGGC GGGGGCGCTG 

37801 TCGCTGGA6G ATGCGGCGCX3 GATTGTGGCX3 GTACGCAGTG ACGCGCTTCG 

37851 TCAGCTGCAA GGGCACGGCG ACATGGCCTC ACTCGGCACT GGTGCCGAGC 

37901 AGGCCGCTGA GCTGATCGGT GATCGGCCGG GAGTGGTCGT CGCGGCAGTC 

37951 AACGGGCCGT CGTCTACCGT GATTTCGGGG CCGCCGGAGC ATGTGGCCGC 

38001 TGTGGTCGCG GAGGCX3GAGG CACGTGGTCT QCGCGCCCGT GTGATCGACG 
38051 . TCGGGTATGC CTCGCACGGC CCCCAGATCG ACCAGCTCCA CGACCTCCTC 

38101 ACCGAGGGCC TGGCTGACAT CCGGCCCGCG AACACGGACG TGGCCTTCTA 

38151 TTCGACGGTC ACX:GC?CGAGC QCCTGACGGA CACCACAGCC CTGGATACOO 

38201 ATTACTGGGT GACCAACCTC CGCCAGCCGG TCCGGTTCGC CGACACCATC 

38251 QAAGCGCTTC TCGCGGACGG CTATCGCCTG TTCATCGAGG CCAGCGCX3CA 

38301 CCCGGTGTTG GGCCTGGGCA TGGAGGAGAC CATCGAGCAG GCGGACATGC 

38351 CTGCCACGOT. CGTCCCCACC CTOCGCCGCG ACCACGGCGA CACCACCCAG 

38401 CTCACCCX3CG CCX3CCGCCCA CGCCTTCACX: <3CCGGCGCCG ATGTCXJACTG 

38451 GCGACGCTGG TTCCCX3GCCG ACCCCACCCC CCGTACCGTC GACCTCCCCA 

38501 CCTACX3CCTT GCAGCACCAG CACTACTGGC TGGAGGAGCC CAGTGGGCTC 

38551 ACCGGAGACG CCGCCGACCT CGGCATGGTG GCCGCCGGGC ATCCGCTGCT 

38601 GQGTGCCTOT GTGGAACTCG CGGAGAGCGA ^TCG^tACTTG TTCACCGGGC 

38651 GGCTCtCGCG CAGGGCTCCG TCCTGGCTGG CCGAACACGT GGTGGCGGGG 

38701 ACGGTTCTGG TGCCGGGTGC GGCGTTGGTG GAGTGGGTGC TGCGGGCCGG 

38751 CGATOAGGCG GGATGCCCGA CGATTGftGGA ACTGACGCTC CAGGCGCCGT 

38801 TGGTGCTGCC CGAGTCGGGC GGGTTGCAGG TTCAGGTGGT CGTGGGTGCG 
388S1 ACCGATGAGC AGAGCGQCXIG TCGTGACGTA CACGTQTATT CfSAOGTCTGA 
38901 • GCAGGACGOG TCGGOGGTGT GGGTGTGCCA TGOCGTCGGT GTGGTGAGCT ' 
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38951 CCGAAATGCC AGAAGCGGCA GCCGAGTTGA GTGGGCAGTG GCCTCCTGCC 
39001 GGGGCCGAAG CCGTGGATGT CGAGGACTTC TACGCGCGGG CCGCGGAGGC 
39051 CGGATACGCC TACGGTCCGG CGTTCCAGGG GCT6CGGGCG CTGTGGCGGC 
39101 ACGGGACGGA GCTGTTCGCC GAGGTGGTGC TGCCCGAACA GQCQC^QQG 
.39151 CACGACGGTT TCGGCATCCA CCCGGCGCTG CTGGACGCCG CCCTGCATCC 
39201 GCTGATGCTC CTCGACCGGC CCGCGGACGG GCAQATGTGG CTGCCGTTCQ 
39*251 ■ CGTGGAGCX3G GGTGTCGCTG AACGCGGACC GGGC6ACCCA CGTCCGTGTC 
39301 CGGCTCTCCC CGCGGGGGGA GGCGGCCGAG CGTGACCTGC QQGTCGTCAT • 
39351 CGCCGACGCG ACGGGCGCGC CCGTCCTGAC GGTCGACGCC CTGACCCTGC 
39401 GCGCGGCCGA TCCCGGCCGG CTGGGTGCGQ COGCCCGTGG CGGTGTCGAC 
39451 G6CCTCTACA CCGTCGACTG GACCCCGCTG CCCCTGCCCC AGCCCCTTCC ■ 
39501 GCTGCCGCGG ACGGATGCAG GGGGGAGTGC CGACTGGGTC ATACTCTCGG 
39551 ACAACTCCAG TGCAGCTCTG GCTGATGCGG TGTCGTCCGC GACOOCGQCA 
39601 GGTCGCGGAG- CGCOGTGGGC ATTGCTCGCT CCCGTGGGTG GCGGCTCTGC 
39651. CGATGACGGG CTGCCX30TG0 TGCGGCGQAC CCTCTCCCTC GTACAGGAGT 
39701 TCCTGGCCGC CCCGGAGCTG ACCGAGTCCC GTCTCGTCAT CGTGACACGC 
39751 GGTGCCGTGG CCACCGACGC CGATGGTGAC GTCGCGGCGT CCGCGGCAGC 
39801 GGTATGGGGC CTGATCCGCA GCGCCCAGTC GQAGAACCCG GGCCGCTTCG • 
39851 TCCTGCTCGA CGTCGAGGAG GAGCACCTCC ACCCGGACGG CGGGGAACTG 
• 39901 CCGTACGCCG CCCTGCGCCA CGCCGTAGAG GA6CTCGACG AGCCTCAACT 
39951 TGCCCTCCGC AGCGGCAAAT TCCTCGTACC GCGCATGACG CCCGCCGCCG 
40001 CCCCCOAGGA GCTCGTCCCG CCGGTCGGTA CGTCCGGCT6 GCGCCTCGGC 
40051 . ACCTCCGGTA CGGCCACCCT GGAGAATCTG TCGGTGATCG ACGCTCCCGA 
40101 QG00TTCGC30 COGCTGGAGC CCGGGCAGGT 6CGGATCTCC GTAGGQGCGG 
40151 CGGGCATGAA CTTCCGTGAt: GTGCTGATCG <:GTTGGQCAT •GTATCCCGAC - 
40201 AAGGGCAGGT TGGCGGGAAG • CGAGGGGGCC GGACATGTGA CGGAGGTGGG 
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40251 ACCGGGCGTC ACTCATCTGT CGGTCGQTGA CCGGGTGATG GGTCTGTTCG 

40301 AGGGCGCGTT CGCTCCQCTG GCCGTCGCGG ACGCCCGGAT GGTCGTCCCG 

40351 ATTCCGGAGG GCTGQAQCTT CCAGGAQGCC GCG0CGGTGC CCGTGGTGTT 

40401 CCTCACGGCC TGGTACGGCC TCGTGGACCT CGGCCGCCTC CGGGCGGGCG 

40451 AATCGCTGCT CATCCACGCG GGCACC(3GCG GAGTGGGCAT GGCCGCCACC 

40501 CAGATCGCCC GCCACCTGGG CGCCGAGGTQ TTCGCCACCG CGAGCCCdGC 

40550, CAAGCACGGQ GTGCTCGACG GCATGGGCAT CGACGCGGCC CACCGCGCCT 

40601 CCTCCCGTGA CCTCGACTTC ^GGAGACCT TGCGGGCGGC GACGGGCGGG 

40651 CGCGQCATGG ACGTCGTACT CAACAGTCTG GCCGGGGAGT TCACCGACGC ' 

40701 CTCGCTGCGG CTOCTCX3CCG AQGGOGGGCG CATGGTGGAC ATGGGCAAGA 

m 

40751 CCGACAAGCG CGACCCCGAC CGGGTCGCGG CCGAGCACGC GGGCOCGTGG 
40801 TACCGGGCCT TCGACCTCGT GCCGCACGCG GGGCCCGACC GGATCGGGGA 
40851 AATGCTGGCG GAGCTGGGCG AGTTGTTCGG CTCCGGCGCC CTGGCQCCGC 
40901 . TGCCGGTCCA GACCTGGCCG CTGGGCCGGC3 CGCGTGAGGC GTTCCGGTTC 
40951 ATGAGCCAGG CGAAGCACAC CGGCAAGCTG GTOCTOGAGA TCCCGCCCGC 
.41001 CCTCGATCCG GACGGCACGG TGCTCATCAC CGGCGGCACC GG6GTCCTCG . 
41051 CCGCCGCGGT GGCCGAGCAT CTGGTGAGGG AGTGGGGCGT ACGACACCTG 
41101 CTGCTGGCCG GGAGGCGCGG TTCCGAGGCG CCCGGQAGCA GTGAACTCQC 
41151 CGAGGAACTG ACCGAGTTGG GGGCCGAGGT GACCTTTGCC GCGGCCGATG 
41201 TCAGTGATCC GGAGGCOGTG GCGGAGCTCG TCGGCAAGAC CGATCCGGGG 
41251 CACCCGCTGA CCGGTGTGAT CGACGCGGCC GGTGTGCTGG ACGACGCCGT 
41301 GGTCACCGCA CAGACCCCGG AGAGCCTCGC <3CGGGTGTGG GCGGCGAAGG 
41351 CGACGGCCGC ACACCTGCTG CACGAGGCGA CCCGGGAGGC GCGCCTCQGT 
41401 CTCTTCCTGG TGTTCTCCTC GGCGGCGGCG ACACTCGGCA GTCGGGGACA " 
41451 <3QCCAACTAC GCGGGGGCCA AGGCCTATTG GGACGCCCTC, OTCCGGCAAC 
41501 OGCGTGOCGA <5GGCCTGGCC GGTCTCTCGA TCGGCTGGGG TCTGTGGCAG 
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41551 ACGGCGAGCG GCATGACCGG ACACCTCGGC GAGACGGACC TGGCACGCAT 

41601 GAAGCGCACC GGGTTCACCC CGCTGACCAC CGAAGGTGGC TTGGCCCTCC 

41651 TCGACGCCGC CCGCGCCCAC GGCeGCCCGC ACGTGGTCGC GGTGGACCTC 

41701 GACGGQCQCO CCGTCGCCGC GCAGCCCGCC CCGTCCCGGC CCGCGCTCCT 

41751 GCGCGCCCTG GCCGCGGGTG .CGACCCCGGG GGCCCGCACC GCCCGGCGCA ' 

41801 CCQCOOCCGC GGGCAGCGTC GCCCCGGC6G QCQGTCTCGC CGACCGGCTC 

41851 GCCGGCCTGC CGCATCCCGA ACGGCGCCGG CTGCTGCTCG ACCTCGTACO 

41901 TGGCAACGTC GCCGGCGTCC TCGGGCACAG CGACCACGAC GCCGTCCGCC 

41951 CGGACACGTC GTTCAAGGAG CTCGGCTTCG ACTCCCTGAC CGCCGTGGAA 

42001 CTGCGCAACC GGCTGGCCGC CGGCACCGGC CTGAAGCTGC CCGCGGCGCT 

42051 CGTCTTCGAC TACCCCGAQT CGGCCACCXIT CGTCGACCAC CTCCTGGAGC 

42101 GTCTGTCGCC CGACGGCGCG CCGCCGCCCG TCAAGGACGC CGCGGACCCC 

42151 GTTCTCAACG ACCTCGGCAQ QATCGAGTCC TCCCTGGACG CX3CTCGCCCT 

42201. CGACGCGGAC GCGCGCAGCC GGGTCACCAG GCGTCTGAAC ACCCTGCTGT 

42251 CGAAOCTGAA CGGAGCCGCC ACCGCCGGCT CCCCGGCGGA CGTCACGGAC 

42301 CTGGACGCGC TGGACGCGCT GGAOQACJGTG TCCK3ACGAOG AGATGTTCGA 

42351 GTTCATCGAC CGAGAGCTGT QACCCCCCTG CCCGCCCCGT CCCCCTTCCC 

42401 CGCCCCCACX5 TTCCCCX3TGC * CCOTOTCTGA TGGAGAAGTG ACGTTCGATG 

42451 TCGAGTGCTG AAGAGTCGAG TCCTGATGTG TCCGGCACGG GTGTGTCCGG 

42501 TACGGGAGAG TCCGCTACGG GTAOGTCGAG TACGGAAGCC AAGCTTCGGC 

42551 AGTATCTGAA GCGGGTCACG GTGGACCTCG GCCAGGCCCG CCGGCGGCTG 

42601 CGCGAGGTGG AGGAGCGGGC CCAGGAGGCG ATCGCCATCG TCTCCATGGC 

4265.1 GTGCCGCTTC CCCGGCGACA CCCGCACGCC CQAGGCGCTG -TGGGACCTGQ 

42701. TCGCCGAGGG CGGCGACQCC ATCGACGACT TCCCCACCAA TCGCGGCTGG 

42751 -GACCTOGAGA GCfCTCTACCA CCCSCGAOCCfC QACCACCCOG GCACCAGCTA 

42801 CGTCCGACGC GGCXSGGTTCC TGTACGACGC CCCCGCCTTC OAOGCGTGGT 
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42851 TCTTCGGGAT CAGCCCGCGC GAAGCCCTGG CCATGGACCC GCAGCAGCGG 

42901 GTGCTCATGG AGACGGCCTG GCAGCTCCTG GAQCX3GGCCG GCATCGACCC 

42951 iSGCCTCGCTG AAGCTGAGCG CCACCGGCGT CTACATCGGC GCGGQCGTGC 

43001 TCGGGTTCGG CGQCGCOCAG CCCGACAAGA CGGTAQAGGG CCACCTCCTQ 

43 051 ACCGGCAGCG CGCTGAGTGT CCTGTCCGGC CGCATCTCCT TCACGCTCGG 

43101 CCTCGAGGGC CCGTCGGTCA GTGTCGACAC GGCOTGCTCC TCCTCGCTGG 

43151 TCTCCATGCA CCTGGCGGCC CAGGCGCTGC GGCAGGGGGA GTGCGATCTC 

43201 GCGCTGGCCG GCGGTGTCAC CGTGATGTCG ACGCCCGGCG CGTTCACCGA 

43251 GTTCTCCCGC CAGGGC?GCQC' TQTCTCOGGA' CQGCCGCTCG AAGQCTTTCG 

43301 CGGCCTCGGC CGAGGGCACC GGTTTCTCGG AGGGCGCGGG ACTQCTCCTC 

m 

43351 CTGGAGCGGC TCTCCGACGC GCGCfCXSCAAC GGCCACAAGG TGCTCGCGGT 

43401 GATCCGCGGC TCGGCCGTCA ACCAGGACGG CGCGAGCAAC GGTCTCACCG 

43451 CCCCCAACGG CCCCTCCCAG GAACGCGTGA TCCGCGCCGC CCTCGCCAAC 

43501 GCQ3GCCTGG GCGCCGCCOA" GQTCJGACGCG QTCGAGGCAC ACGGCACCGG 

43551 CACGAAGCTC GGCGACCCCA TCGAGGCCGG TGCGCTGCTC GCCACCTACG 

43601 GCCGCGACAG GGACGAGGAC CGGCCGCTGT GGCTGGGCTC GGTCAAGTCX3 

43651 AACATCGGTC ACCCGCAGGG CGCAGCAGGC GTCGCGGGCG TCATCAAGAT 

43701 GGTGATGGCG CTOGAGGGCQ AACTQCTCCC . CGCCACCCTO TACGTCGACQ . 

43751 AGCCGACCCC GCACGTCGAC TGGTCCTCGG GCTCCGTCAG GCTCCTCACC 

43801 GAACCGGTCC CGTGGACCCG CGGGGAGCGC CCGCGCCGCG CGGGCGTGTC 

43851 CGCCTTCGGC ATQTCCGGGA CQAACGCCCA CGTGATCCTG GAGGAGGCAC 

43901 CGCCCGAGGA GGCAGCGGCC GCGGAGACAC CGGCGGAAGG GACAGGCGCA 

43951 GTCGTCCC3GT GGGTCGTCTC CGGCCGGGGC- GAGGAAGCGC TQCGGGCCCA ' 

44001 GGCCGCACAG CTOGCCGAGC ACGTGCGCGA . CGAGGACCAG -CGGCCGGCGT 

. m 

44051 caccxx:tgga ggtggggtgg tggcoxsgcca cgacaoggtc ggtgttcgag 

44101. aacoggggcg tcgtcqtcgg ggacgaocgc gacgcqctcc tcgaoggcct 
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44151 CCGGTCGCTG GCGGCAGGTG AGGCGTCGCC GGACGTGGTG TCCGGGGCGG 

44201 TCGGCCCCAC GGGGCCCGGG CCGGTCATGG TGTTCCCCGG CCAGGGCGGC 

44251 CAGTGGGTGG GCATGGGGGC CCGGCTCCTC GACGAGTCCC CGGTGTTCGG 

44301 GGCCCGGATC GCCGAGTGCG AGCAGGCCCT GTCGGCGTAC GTGGACTGGT • 

44351 CCCTGACCGA CGTGCTGCGC GGGGACGGGT CGGAGCTGQC CCGGATCGAC 

44401 GTCGTCCAGC CCGTGCTGTG GGCCGTCATG GTCGC6CTCG CCGCCGTCTG 

44451 GGCGGACCAG GGAATCGAAC CCX3CCGCCGT CGTCGGCCAC TCQCAGGGCG 

44501 AGATAGCCGC GGCGTGCGTC GTGGGCGCCA TCTCCCTGGA CGAGGCGGCC 
44551 • CGCATCX3TCG CCGTACGGAG TGTGCTGCTG CGGC A6CTGT • CCGGACGCX3G 

44601 CGGCATGGCG TCCCTGGGGA TGGGCCAGGA GCAGGCCGCC GACCTGATCG 

44651- ACGGACACCC GGGTGTGGTC GTCGCGGCCG TCAACGGGCC GTCGTCCAGC . 

44701 GTCATCTCGG GCCCGCCCGA GGGCATCGCC GCCGTCQTCG CCQACGCCCA 

44751 GGAGCGGGGC CTTCGCGCCA GGGCCGTCGC CTCCGACGTC GCGGGCCACQ 

44801 GCCCGCAGCT GGACGCGATC CTGGACCAGC TCACGGAGGG CCTGGCCGGC 

44851' ATCCGGCCCG CCGCGACCGA CGTCGCGTTC- TACTCCACCG TCACCGCCGG 

44901 GGACCTCACC GACACCACCG AACTCGACAC 'CGCGTACTGQ .GTGCGGAACG 

44951 TGCGCCGGAC GGTGCGTTTC GCCGACACGA TCGACGCGCT GCTCGCGGAC 

45001 GGGTACCGCC TGTTCATCGA GGTGAGCCCC CACCCCGTCC TCAACCTCGC 

45051 GCTGGAAGGC CTCATCGAAC GGGCGGCCGT GCCCGCCACG GTCGTGCCCA 

45101 CCCTGCGCCG CGACCACGGC GACACCAGCC AGCTCGCCCG CGCCGCGGCC . 

45151 CAOGGCTTCG CCGCOGGCGC GQACGTCGAC TGGCGGCGCT <3GTTCCCGGC • ' 

45201 CGACCCCGCC CCCCGTACCG TCGACCTGCC CACCTACGCC TTCCAGCGCC 

45251 AGGACTTCTG GCCGGCGCCC GCCGGCGGGC GGTCCGGCGA CCGTGCCGGG • 

45301 CTCGGCCTCG CGGCCTCGGG ACACCCGCTC CTGGGCGCCT CCGTGGGCCT 

45351 CXXX3AGCGGG GACGTACACC TGCTGAGCGG <3CGGGTGTCC CGGCAGTCCG ' 

45401 CCGCGTGGCT GGAGGACCAC GTCGTGGCGG tSCCAGGCCCT GGTGCOCGGC 
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45451 GCGGCGCAGG TGGAGTGGGT GCTGCGGGCC GGCGACGAC3G CGGGCTGCTC 

45501 CGCCCTGGAG GAGCTGACGC TCCAGACGCC GCTCGTGCTG CCCGACACCG 

45551 GCGGCCTGCG GATCCAGQTC GTCOTCQAAG CQGCCGACGC ACACGGCCGG 

45601 CGCGACGTCC GGCTGTTCTC CCGCCCCGAT GACGACGACG CCTTCGCGTC 

45.651 GACGCACCCC TGGACCTGCC ACGCCACGGG CGTGCTCGCC CCCGCCCCGA 

45701 CGGACGGCAC CAACGGAACG CGGGACGCCG CCGACACCCT GGACGGCGCA 

45751 TGGCCCCCGG CCGACGCCGA ACCCGTCCCC GCCGACGACC TCTACGCGCA 

45801 GGCCGACCGC ACCGGATACG GCTACQGCCC CGCCTTCCQG GGCQTACGGG 

45851 CX3CTGTGGCG CCACGGCAAG GACGTCCTGG CCGAGGTGAC GCTGCCCAAG' 

45901 GAGGGCGGCG ACCCGGACGG CTTCGQTATC CACCCX3GCCC TCCTCOACGC 

•45951 CGTCCTGCAA CCCGCCGCAC TGCTGCTGCC CCCGACCGAC GCCGAACAGG 

46001 TCTGQCTGCC GTTCQCCTQG AACGACGTGG CGCTGCACGC CGTACGGGCC 

46051 ACCACGQTCC GGGTGCGCCT CACCCCGCTC GGCGAGCGGA TCGACCAGGG 

46101 GCTGCGCATC ACCGTGGCCG ACGCCGTGGG CGCGCCCGTG CTCACCGTCC. 

46151 GCGACCTGCG CTCGCGCCX:G. ACCQACACAG QCCQCCTCSGC CGCGGCCGCG . 

46201 ACCCGCGACC GGCACGGGCT GTTGGACCTG GAGTGGATCG CGCCGGAGAA 

46251 CGCGGCGGAG AACGCX3GCX3G GTCCGGCCCQ jGfGAOGOGTCC GAAGGGTGGG 

46301 TGACACTCGG CGAGGACGCC GCGAGCCTCG CGGACCTGCT GGCGTCCGTC • 

46351 GAGGCGQGCG CTCCGGCGCC GCAGCTCGTQ GCCQCCCCCG TCGAACCCGA 

46401 CCGGACCGAC GACGGCCTGG CACTCGCCAC CCACGTCCTC GACCTCGTAC 

46451 AGACCTGGCT CGCCTCGCCC CTGCACGACT CCCGCCTGGT CCTGGTGACG 

46501- CGAGiSaOOAG TGACGGATGC GOATGTGGAT OTGGCTGCCG CGGCCGTTTG 

4€551 GGGTCTGGTA CGCAGCGCCC AGTCGGAGCA* CCCCGGCCGC TTCACGCTGA 

466-01 TCGACCTCGG CCCCGACGAC ACGCTTGOCG CAGCCATGCA GGCGGCGCAC 

■ 46651 CTGOAAGAGC CGCAACTGGC GGTGCACGGC GGCGAGATAC iSAGTGCCGCG • 

46701 ACTGGTCCGC GCCAGGACCG ACCCGACCGC -CCCGAACGGG ACACCGGAGG 
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46751 CCGACCOGAC GGCGGACCCG TCCGAAGGAC TCCaCCGGAA .CGGTACGGTT 
46801 CTCATCACCG GCGGCACCGG CGTACTCGGC CGACTGGTGQ CCGAACACCT 
46851 GGTCACGGAO TGGGGCQTAC GCCACCTGCT GCTCGCGAQC CGACGCGGCG 
46901 ACCAQGCOCC GGGTAOCGCC OAACTCCQCO CCCOCCTQAG CGAATTGGGA 
46951 GCATCQGTCG. AGATCGCCCC GGCCGATGTC GGCGACGCGQ AAQCQGTCGC 
47001 CGCACTGATC GCGTCGGTCG ACCCGGCGCA CCCGCTCACC GGTGTGATtC 
47051 ACGCGGCCGG TGTCCTGGAC GACGCCGTGA TCACCQCCCA GACCCCCGAQ 
47101 AGCCTCGCGC GQGTGTGGGC GACOAAGGCG ACGGCGGCCC QCCATCTGCA 
• • 47151 CGAGGCGACA CGGGftORCAC CCCTCGACTT CTTCOTOOTO TTCTCCTCGG 
47201 CGGCCGCCTC GCTCGGCAGC CCCGGCCAGQ CCAACTACGC GGCGGCCAAC 
47251 QCCTATTOCG ACQCCCTCQT CCAOCACCGC CX3CGCCCAAG GGCTCQCGGG 
47301 CCTCTCGATC GCCTGGGGCC TGTGGCAGGC GACCAGCGGC ATGACCGGQC 
47351- AfiCTGAGCGA 6ACCGACCT6 GC6CGCATGA AGCGCACCGG GTTCGCCGCX} 
47401 CTQACCGACG • AGGGCGGCCT GGCCCTGCTC GACGCCGOCC GTGCCCACQA 
47451 CCSGGCCTAC GTOGTCGGGG COQACCTCQA CCCGCGCGCC GTGACCQATG 
47501 GCCTGTCCCC GCTCCTCCGC GCCCTCACX3Q CGCCCGCCAC GCGGCX3GOGC 
47-551 QTQGCCTCCG AAGGCCTCGC CGACGGGGCG CTCGCGACCC QCCTGOCCGG 
47601 CCTCQACGOQ GACQGCXIGCC TAASGCTCCT CACCQATOTC GTACQCGAQT 
47651 ACGTCGCGGC CGTCCTCGGC CATGGTTCCG CCGCCCGGGT GGGCOTCORC 
47701 ATCQCCTTCA AGGACCTGGO TTTCGACTCG CTGACCGCGG TGGAGCTGCG 
47751 CAACCGGCTG TCGGCCGCCT GTOACGTQCG GCTGCCCX3CC ACACTGATCT 
47801 TCQACCACCC .CACCCOGCAG <JCTCTCGCCA CCCACCTGGT GQACCGCTTG 
47851. GCGGGCAQCA CCTCCGCQAC CaOSAOGOTS AATGCGACGG CGCCGGCAGC . 
47901 CX3CCCACGTC GCCGCAGGGG CCQACGTCGA CGCAGACACC CACGACCCGQ • 
47951 • TCGCCATOGT CQCCATGACG TGCCGGTTOC aSQGOGGCGT. CGCGTCCCCG 
48001 OACGACCTGT GGGACCTQCT <S3AOGCftOGC AAQGACGCOA TGGGCGCCTT 
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48051 CCCCACCGAC CGCGGCTGGG ACCTGGAACG CCTCTTCCAC CCCGACCCGG 
48101 ACCACCCCOG CACCAGCTAC ACCGACCAGQ QCGQATTTCT TCCCGACGCG 
48151 GGTGATTTCG ATGCGGCGTT CTTCGGGATC AATCCGCGGG AGGCGCTGGC • 
48201 GATGGATCCG CAGCAGCGGT TGTTGCTGGA GGCGTCGTGG GAGGTGTTGG 
48251 AGCGTGCGGG TATCGATCCG ACGACGCTCA AGGGCACCCC GACCGGCACC 
48301 TACGTGGQCC TCATQTACCA CGACTACGCC AAGTCCTTCC CCACGGCCGA 
48351 CGCCCAGTTG GAGGGCTACT CCTACTTGGC GAGCACCGGC AGCATGGTCT . . 
48401 CCGGCCGCGT CGCCTACACC CTGGGCCTTG AAGGTCCGGC GGTGACGGTC 
48451 GACACCGCGT GCTCCTGCTC CCTGGTCTCC ATCCACCTGG CGACGCAGGC 
48501 ACTCCGGCAC GGCGAGTGCG ACCTCGCCCT GGCAGGCGGT GTGACCGTCA 
48551 TGGCCGACCC GGACATGTTC GCGGGCTTCT CGCGCCAGCG CGGCCTCTCA 
48601 CCTGACGGCC GCTGCAAGGC CTACGCCGCC GCGGCCGACG GAGTCGGATT 
48651 CTCCGAGGGA GTGGGCGTAT TGCTCCTTGA GCGGTTGTCG GATGCGCGGC 
48701 GTCATGGGCG TCGGGTGTTG GGTGTGGTGC <5GGGTTGGGC GGTGAATCAG • 
48751 GACOGTOCQA OTAAT6GQTT GACGGCGCCG AATGGTCCGT CGCAGGAGCG . 
48801 GGTGATTCGT CAGGCGTTGG CCAGTGGTGG GTTGTCGTCG GTGGATGTTG 
48851 ATGTGGTGGA GGGGCATGGG ACGGGGACCA .CGTTGGGTGA. TCCGATCGAG 
48901 GCGCAGGCTC TGCTGGCCAC ATATGGGCAG GGGCXSTCXIGG ' AGGACCGTCC 
48951 GTTGTGGTTG GGGTCGGTGA AGTCGAACAT TGGTCATACG CAGOCGGCTG 
49001 CX3GGTGTTGC GGGTGTCATC AAGATGGTGA TGGCGATGCG GCATGGTGTG 
.49051. GTGCCGGCGA .GTTTGCATGT <3GATGTGCCG TCGCCGCATQ TGGAGTGGGA 
49101 TTCGGGTGCG GTGCGGTTGG CGGTTGAGTC GGTGCCATGG CCGCAQGTGG 
49151 AGGGTCGTCC GCX5TCGGGCG GGTGTGTCGT CGTTCQOCGC TTCGGGGACG 
• 49201 AATGCGCACG TGATCGTGGA OTCTGTTCCC GATGGGCTOG AGGAGGACTC 
492S1 GGTATGGGTC QGOGQTGAGG CTCTTGAGAC <3GAGACTGAC GGGCGCTTGG 
49301 TGCCGTQGGT -GGTGTCGGCC CGCAGCCC3GC AGGCCCTGCG CGACCAGGCA 
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49351 CTACGCCTGC GTGACTTTGC 

49401 CGACGTGGGC TQGTCGCTGC 

49451 CCGTTGTGGT GGGCGCGGAG 

49501 CTGGCGACGG GTGAGCCGCA 

49551 GGCTCGGGTG GGTGGCGATG 

49601 QTCAGTTGGT CGGTATGGGT 

49651 6CGGCTGCGT TTGATGAGGT 

49701 GGAGGCGGGT GGGTTGCGGG 

49751 TGQATCACAC GGTGTGQGCG 

49801 TTGGCCCGGT TGTGGGAGTC 

49851 GCATTCGATC GGTGAGATCQ 

49901 TGGCGGATGC GTGTCGGGTG 

49951 CTGCCTGAGG GTGGGGCGAT 

50001 GGCCGCCGAG GTGGACGGAT 

50051^ CCGACTCCAC GGTGATTTCG 

'50101 GGGGTGTGGC GGGAGCGTGG 

50151 TGCCTTCCAT TCGGCGTTGA 

50201* CGATACGAGG GGTCAAGTTC 

50251 GTCTCCGGAG AGCGGGCCGG 

50301 GAGGCATGTA CGTAATGCGG 

50351 CGGATTCAGC GGGCGTGTTT 

50401 ACGGCCGCCC AGCACACCCT 

50451 GGTGGCGTCT CTCGCTCGaTG 

S0501 CGATGGCTCG TCTGCATACC 

50551 -TTCGCGGGTG ATCGTGTGCC 

S0601 CCAGCGGGAG GGGTTCTGGT 



CAGTGACGCG TCGTTCCGCG CGCCGCTCGC 
TGAAGACGCQ TGCGCTGCAT GAGCATCGCG 
CGGGCAGAGC TGATCGCCGC TCTGGAGGCG 
TGCGGCGCTG GTCGGCCCGG CTTGCTCGCA 
ACGTGGTGTG GCTGTTCAGT GGTCAGGGCA 
GCTGGTTTGT ATGAGCGGTT CCCGGTGTTT 
GTGCGGCCTG TTGGAGGGGC CQTTGGQCGT 
AGGTGGTGTT CCGTGGCCCG CGGGAGCGGT 
CAGGCGGGGT TGTTTGCGCT GCAGGTGGGG 
GGTCGGGGTG CGGCCGGATG TGGTGCTCGG 
CX3QCCQCGCA TGTGGCGGGG 6TTTTTGATC 
GTGGGTGCGC GGGCGCGTTT GATGGGTGGG 
GTGCGCGGTG CAGQCCACGC CCGCCGAGCT 
CGGCTGTAAG TGTGGCGGCA GTCAACACCC 
GGCCCGTCGG ACGAG6TGGA CCGGATTGCT 
GCGCA7W3ACG AAGGGGCTGA GCGTCAGTCA 
TGGAGCCGAT GCTCGCGGAG* TTCACCGAAG 
AGGCAGCCGT CGATCCCGCT CATGAGCAAT 
CGAGGAGATC ACGGATCCGG AGTACTGGGC 
TGCTCTTCCA QCCCGCCATC GCCCAAGTAG 
QTGGAGCTCG GCCCCGCGCC. TGTGCTGACC 
GGACGAGTCG GACAGCCAGG AGTCGGTGCT ' 
AGCGTCCTGA GGAGTCX3GCG CTTGTGGAGG 
GCTGGTGTTG CTGTGGACTG GTCGGTGTTG ■ 
TGGGGTGGTG GAGTTGCCGA GGTATGCGTT • 
TGAGTGGCCG TTCTGGGGGT GGGGATGCGG 
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50651 CGACTTTGGG GTTGGTGGCG GCGGGGCATC CGTTGTTGGG TGCGGCGGTG 

50701 GAGTTCGCGG ACCQGGGTGG QTGTCTGCTG ACCGGTCGTC TGTCGCGGTC 

50751 TGGGGTGTCG TGGCTTGCTG ATCATGTGGT GGCGGGTGCG GTTTTGGTGC 

50801 CGGGTGCTGC GTTGGTGGAG TGGGCGTTGC GGGCCGGTGA TGAGGTCGGT 

50851 TGTGTGACGG TGGAGGAGTT GATGTTGCAG GCGCCTTTGG TGGTGCCTGA 

50901 GGCGTCGGGT CT(3CGGGTTC AGGTGGTGQT TGAGGAGGCG GGTGAGGA^IG 

50951 GGCGGCGCGG TGTTCAGATC TACAGCCGGC CCGACGCGQA CQCCGTGGGC 

51001 GGCGATGACT CGTGGATCTG CCACGCGACC GGCGTACTGT CACCCGAAAG 

51051 CGCTCGTCTG GACACGQAGT TGGGTGGCOT CTGGCCACCG GGCGGTGCCG 

51101 AACCGCTGGA TGTCGACGGC TTCTACGCGC AGGCCGGTGA GGCCGGGTAC 

51151 GGATACGGTC CGGGGTTCCG GGGGCTGCGT GCCGTGTGGC GGCACGGCCA 

51201 GGACCTGCTG GCCGAGGTCX3 TCCTGCCCGA AGCCGCC3GGT GCCCATGACG 

51251 .GCTACGGGAT CCACCCCGCC CTCCTCGACG CCACCCTCCA TCCGCTGCTC 

51301 GCCGCCCGCT TCATGGACGG TTCCGAGGAC GATCAGCTCT ACGTACC6TT 

51351 CGGGTGGGCC GGAGTGTCTC TGCGGGCGGT GGGAGCCACG ACTGTGGGCG 

51401 TGGGCCTCCG TCCGGTCQGG GAGAGCGTCG ACCAAGGGCT GAGCGTGACG 

51451 GTCACCGATG CGACCGGCGG TCCCGTTCTG AGCGTCGACT CCCTCCAGAC 

51501 CCGCCCCGTG AAGCCGAGCC AATTGGCTGC GGCCCAACAG CCGGACGTAC 

51551 GCGGTCTGTT CACTGTQGAG TGi3ACGCC3GC TGCCGCAGAC GGATGCCGAC 

51601 GGGGAGGCCG ACTGGGTTGT GCTCrCGGAC GGTGTTGGCC GTCTGGCTGA 

51651 TGTGGTGTCG GCGQCGGQTQ GTOAAGCGCC GTGGGCAGTG GTCGCTCCTG 

51701 TCGATGCGTC TGTGGGCGAC GGCCGTGAGG GTCTTGACGG TCGGCTGGTC 

51751 GTGOAGCGGG TaCTjSTCACT CGTACAGGAG TTCCTGGCCC TGCCGGAGCT 

51801 GGCCGAGTCC CGTCTCCTOG TGGTGAC5GCG CX3GTGCGGTG GCCACCGGCG 

< 

51851 TCGACGGTGA CGGTGACGTG t3ACGCX5TCCG CCGCAGCTGT ATGGGGCCTG 

51901 GTCCGCAGTG CTCAGTCCGA GAATCCGGGC C2GCTTCATCC TGCTCGACGT 
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51951 GGACGOCGAC GGCGACOACC AGGGCCCGGA CCTGAACGGC CGGCATCTGC 
52001 CCCACGCCAC CCTGCGTCAC GCCGCCGAGG AACTCGACGA GCCCCAACTC 
52051 GCGCTGCGGG AAGGGACGCT CTACGTCCCC CGACTGACCC AGGCGCGCCA 
52101 GTCCGCCGAA CTCGTCGTGC CGCCCGGTGA ACCGGCGTGG CGCCTGCGGA 
52151 TGGTGCACGA CGGCTCGCTG GACGCCCTGG CGGCAGTGGC CTGCCCGGAG 
52201 GCCCTGGAGC CCTTGGCGCC GGGGCAGGTG CGTATCGCCG TAGACGCcfec 
52251 GGGCATCAAC TTCCGTGACG TACTGGTGGC CTTGGGTATQ GTGCCCGCGT 
52301... ACGGGGCCAT GGGTGGCGAA gStGCCGGTG TCGTGACGGA GGTCGGTCCC 
.52351 GAGGTCAeCG ATGTCTCGGT GGGCGACCGC GTGATGGGCG TGTTCGAGGG ■ 
52401 CGCGTTCGGC CCTGTGGTGA TCGCCGAGGC GCGGATGGTC ACACCTGTCC 
52451 CGCAGGGCTG GGACATGCGG GAGGCGGCCG GTATTGCGGC GGCCTTCCTG 
52501 ACGGCTTGGT ACGGGTTGGT GGAGCTGGCC- GGTCTGAAGG CGGGCGAGCG 
52551 GGTGCTGGTC CATGCCGCGA. CGGGTGGTGT GGGGATGGCG GCGGTGCAGA 
52601 TCGCCCGGCA TGTGGGTGCC GAGGTGTTCG CCACCGCGAG TCCGGGCAAG 
52651 . CACQCCGTGC • TGGAGGAGAT GGGCATCGAC GCCGCCCACC GCGCCTCCTC 
52701 CCGGGACCTC GCGTTCGAQG GCACGTTCAG GGAAGCAACG GGCGGCCGCG 
52751 GCATGGACGT CGTGCTCAAC AGCCTTGCCG GCGAGTTCAT CGACGCCTCT 
52801 CTGCGGTTGC TCGGCGAGGG CGGCCGGTTC CTGGAGATGG GCAAGACCGA • 
52851 TGTGCGGGCC GCCGAAGAGG TGGCTGCGGA GCACGCGGAC GTCTCGTACA 
52901- CQGCGTACGA CCTCGTGGGT GATGGCGGAC CCGACCGCAT CAGCAACAXG 
52951 CTGGACAAGC TCGTCGAATT GTTCGCCTCA GAACGGCTTA AGCCGCTGCC 
53001 <3GTACGTTCC TGGCCGCTGG ACAAGGGGCA GGAGGCGTTC CGGTTCATGA 
530&1 .GTCAGGCGAA GCACACCGGC AAGCTGGMC TTGAGATCCC GCCTGCCCTC 
53101 GACCCGGAGG -GCACGGTTCT GGTCACGGGG GGCAGGGGTG CGCTGGGGCA ' • 
53151 GQTOGTQGCC GAGCATCTGG TCCGGGAGTG GGGCGTACGG CACCTGCTGC 
53201 TGGCCAGCCG TCGCGGTCCG GAGGGGCCGG -GCMCGACGA ACTGGCCTCG • 
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53251 AAGCTCACCG GGTTGGGTGC CGAGGTCACC ATTGTCGCGG CCGATGTCAG 
53301 COACCCGGCC TCGGTGGTGG AGCTGGTCGQ CAAOACGGAT CCCTCGCATC 
53351 CGTTGACGGG TGTCGTGCAC GCGGCGGGCG TGTTGGAGGA CGGTGTCGTG 

• 53401 ACCGCTCAGA CGCCTGAGGQ GCTGGCGCGG GTGTGGGCGG CCAAGGCTGC 

• •53451 TGCGGCGGCG AATCTCCATG AGGCGACCCG GGAOATGCGT CTCGGCCTGT 
53501 TCGTGGTGTT CTCCTCGGCG GCCGCCACGC TCGGCAGTCC GGGCCAGCfCC 
53551 AACTACGCGG CCGCCAATGC CTATTGCGAC GCGCTGATGC AGCACCGACG 
53601 GGCGGTGGGC CAGGTCGGCC TOTCGGTCGG CTGGGGTCTC TGGGAGGCGC 
53651 CGGACGCCAA GCCGGGTGTT GCCGCCX3ACQ CCAAGGCGAG TGCTGCCACC 
53701 GTCGGCAAGG CGAGTGCTCT ATCCGACGGC ACGAACGGCA GGGCTCCCCA 
53751 GGACACGACC GGCACCGCCC CCCAGGGCAT GACCGGCGGA CTCACCGACA 
53801 CCGACGTAGC CCGGATGGCA CGTATCGGCG TCAAGGGCAT GAGCAACQCC 
53851 CACGGTCTCG CCQTGTTCGA CGCCGCGCAC CGCCACGGCC GCCCCCACCT 
53901 GGTCGGCTTC AACCTCGACC TGCGCACCCT GGCCACGCAC CCCCTGCACA 
53951 CCCGGCCCGC CCTTCTGCGC GGCCTGGCCA CCCCCACCGC CGGCGGGGCG * 
54001 . AGCAGGCCGA CCGCGACGGC GGGCGGACAG . CCCGCCGACC TGGCGGGCCG 
54051. GCTXSGCCGCG CTGTCGCCGT CGGACCGGCA CCACACOCTG GTCCGGCTCA 
54101 TCAGGGAACA GGCCX3CCACC GTGCTCGGGC ACCACCCGGA CAGTCTCACC 
54151 ACGGGCAGCA CCTTCAAGGA ACTCGGATTC GACTCCCTGA CCGCGGTCGA 
54201 ACTGCGCAAC AGGCTGTCOG CCGCCACCGG TCTCCGGCTC CCCGCCGGCC 
54251 TGGTCTTCQA CCACCCGGAC GCCGACATCC TGQCCGAACA CCrCGGCGCG 
54301 CAACTCGCCC CCGACGGGGA CACCCCCGCC GGTGCGGAAG CCACCGACCC 
54351 GGTCCTCCGC GACCTGGCGA AACTCGAGAA CGCCCTCMC TCCACCCTCG 
54401 TCGAGCACCT CGACGCCGAC GCGGTCACQG CCCGACTGGA AGCACTCCTG 
54451 TOGAACTGQA AGGCGOGGRG QGCGGCGCGC GGCTCGGGCA GCAOGAAGGA 
54501 <3CAGCTCCA<3 GTTGCCAOGA CX3GACCAGGT CCTCGACTTC ATCGACAAAG 
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54551 AACTGGGTGT GTG;1?VACGAC CGTGCACGGC GCGACAACCA CGCTGAAGGC 
54601 TGGGTGAACT CTCATGGCGA GTGAAGAGGA ACTGGTCGAC TACCTCAAGC 
54651 GGGTCGCCGC CGAACTGCAC GACACCCGGC AGCGCCTGCG CGAGGTCGAG 
54701 GACCGGCGGC AQGAGCCGGT GGCCGTCGTC GGCATGGCCT GCCGTTTCCC 
54751 CGGCGGCATC GAGACGCCCG AGGGACTGTG GGAQCTGGTC QCGGCCX3GCG 
54801 ACGACGCCAT TGAGCCCTTC CCCACCGACC GGGGCTGGGA CCTGGAAC3GC 
54851 ATCTACCACC CQGACCCCGA- CCACCC6GGT ACCTGCTACG TGCGGGAGGG 
54901 CGGGTTCCTA GCCGCCCCTG AdCGGXTCGA CTCCGACTTC TTCGGCTTCA 
54951 GCCCGCGCGA G6CCCTGGCC AGCAGCCCGC AACTGCGACT GCTCCTGGAG 
55001 ACGTCCTGGG AGGCCCTCGA ACGGGCGGGC ATCAACCCCG CCTCGCTCAA 
. 55051 QGGCAGCCCC ACCGGCGTCT ACGTCGGCGC CGCGACCACC GGCAACCAGA 
55101 CGCAGGGCGA CCCCGGCGGC AAGGCQACCG AGGGTTACGC GGGCACCGCG 
55151 CCCAGCGTCC TCTCGGGCCG GCTCTCGTTC ACGCTCGGCC TGGAGGGCCC 
55201 GGCGGTGACC .GTCGAGACAG CGTGCTCCTC CTCGCTGOTG GCOATGCACC 
55251 TGGCGGCCAA CGCCCTGCGC CAGGGCGAGT GCGACCTCGC CCTCGCGGGC 
55301 OGCGTCACCG TCATGTCCAC CCCCGAGGTG TTCACAGGCT TCTCGCQTCA 
55351 GCGGGGACTG GCCCCCGACG GCCGCTGCAA QCCGTTCGCC GCC3GCGGCCX3 
55401 ACGGCACGGG CTGGGGCGAG GGCGCGGGCC TGATCCTCCT GGAGCGCCTC 
55451 TCCGACGCCC GCAGGAAGGQ CCACAAGGTe CTCGCX3GTGA TGCGGGGCTC 
55501 GGCGATCAAC CAGGACGGCG CGAGCAACGG CTTCACCGCG CCCAACGGCC 
55551. CCTCJGCAGCa CCGOQTCATC CGCCAGGCAC .TCTCCAGCGC CCACCTCTCC 
55601 ACGTCGGAGA TOGACGTCGT CGAGGCGCAC GGCACCGGCA CCA6GCTCGG 
55651 CGACCCCATC GAGGCCGAGG CGCTCATCGC. CACCTACGGC AAGGAGCX3CG 
.55701 AGGACGACCG .TCCCCTGTGG CTXXSGCTOQG TCAAGTCCAA CATCGGCCAC 
55751 ACGCAGGCCG CCGCGGGGGt CGCCGGAGTC ATCAAGATGG TGATGGCGCT 
5S801 ACAGCXSCOAA CTGCTTCCCG GCACGCTGAA C3GTCGACGAG CCGACCCCGC. 
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55851 


ACGTCCAGTG 


GGAGGGCGGC GQCGTACGCC 


TCCTGACCGA ACCGGTCCCG 


55901 


TGGTCGCGCG 


GCGAACGCCC GCGCCGCGCC 


GGAATCTCCT 


CCTTCGGCAT 


55951 


ATCGGGCACG 


AACQCGCACG TGGTCCTGGA 


GGAGGCGCCG 


CCGGAGGAGG 


56001 


ACGTGCCGGG 


CCCCGTGGCT GCGGAGCCGG 


AAGGGGTGGT 


GCCGTGGGTG 


56051 


GTCTCCGCGC 


GGACCGAGGA GGCGTTGA6C 


GAACAGGCGC 


GGCGCCTGGG 


56101 


CGAGTTCGTG 


GCCGACACGG ACCCGTCGAC 


CGCTGACGTC 


GGGTGGTckc 


56151 


TGACCACGAG 


CAGGGCGATC CTTGAACACC 


GCGCTGTGGT 


GGTGGGGCGT 


56201 


GATCGGGATG 


CGCTGACGGC CGGCCTGGCG 


GCGTTGGCCG 


CGGGTGAGGA 


56251 


GTCGGCX3GAT 


GTGGTGGCTG GGGTGGCCGG 


TGATGTGGGT 


CCTGGGCCGG , 


56301 


TGTTGGTGTT 


TCCGGGGCAG GGGTCGCAGT GGGTGGGCAT GGGCGCCCAG 

* 


56351 


CTCCTTGACG AGTCGCCCGT- CTTCGCGGCG 


CGGATCGCGG 


AGTGTQAGCA 


56401 


GGCGCTGTCG 


GCGTACGTGG ACTGGTCXJCT 


GAGTGCGGTG 


TTGCGCGGGG 


56451 


ATGGGAGTGA ACTGTCCCGG GTCGAGGTCG 


TGCAGCCGGT 


GTTGTGGGCG 


56501 


GTGATGGTCT 


CGCTGGCTGC CGTCTGGGCG .GATTACGGGG 


TCACCCCGGC . 


56551 


CGCTGTGATC GGGCACTCGC AiCSGGCXSAGAT GGCCGCCGCG TGCGTGGCGG 


56601 


GGGCOCTGTC TTTGOAGGAT GCGGCGCGCG TCGTGGCCGT ACGCAGTGAC 


56651 


GCGCTTCGTC AGCTGATGGG GCAGGGCGAC ATGGCGTCGT TGGGCGCCAG 


56701 


CTCGGAGGAG 


GCGGCTGAGC TCATCGGTGA TCGGCCGGGC GTATGCATCG 


56751 


CAGCGGTCAA CGGGCC3GTCC TCGACAGTCA TTTGAGGACC GCCGGAGCAT 


56801 


GTGGCAGCCG 


TGGTCGCGGA TGCGGAGGAA 


CGTGQTCTGC 


GCGCCCGTGT 


56851 


CATCGATGTC 


QGCTATGCCT CGCACGGTCC 


■ CCAGATCGAT 


CAGCTCCACG . 


56901 


ACCTCCTCAC 


CGACCGGCTC GCCGACATCC 


GGCCCGCGAC 


CACGGACGTG 


56951 


GCCTTCTATT 


CQACGGTCAC CGCCGAGCGC CTGAGGGACA CCACGGCCCT 


57001 


GGATACGGAT 


TACTGGGTTA CCAACCTCCG CXIAGCCGGTC -CGTTTCGCCG 


57051 


ACACCATCGA 


TGCGCTTCTC GC3GGACGGCT 


ATCGCCTGTT 


CATCGAGGCC . 


S7101 


ftGOGOGCAOC GGGTOCTGGG TCTGGQCATG GAGOAGAOCA TCGAGCAGGC 
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57151 GOACATCCCC GCCACGGTCG TCCCCACCCT GCGCCGCGAT CACGGTGACA 

57201 CCACCCAGCT CACCCGTQCC GCAGCXSCACG CCTTCACCGC CX3GCGCCACC 

57251 GTCGACTGGC GGCGCTGGTT CCCGGCCGAC CCCACCCCCC GCACGATCGA 

57301 CCTGCCCACC TACGCCTTCC AGCGCOGCAG CXACTGGTTG CCGGTGGACG 

57351 GTGTCGGAGA TGTGCGGTCG GCCGGGCTGC GGCGGGTGGA ACACTCOCTG 

57401 TTGCCCGCGG CGCTCGGTCT CGCCGATGGT GCGCTCGTGC TGACCGGAtG 

57451 GCTCGCGGCG TCCGGTGGTG GTGGCGGTTG GCTCGCGGAT CACGCGGTGG 

57501 CGGGCACGAC GCTCGTCCGC ^TGCCGCGC TGGTCGAGTG GGCGTTGCGG 

57551 GCCGCCX3ACG AGGCGGGCTG CCCCTCCX:TT QAGGAGCTGA CGCTCCAGGC 

57601 ACCTCTGGTG CTGCCCGGCT CCGGGGGCCT CCAGGTCCAA GTGGTCGTGG 

57651 GTCCGGCCQA CGQACAQGQC GGCCGGCGTG aggtgcgcgt cttctcgcgt 

57701 GTCGACTCGG ACGACGAGGC AGCGGGGCAG GACGAGGGGT GQTCGTOTCA 

57751 CGCGACCGGT GTGCTGAGCC CCGAGCCCGG TGCGGTACCG GACGGGCTCA 

57801 GCGGACAGTG GCCCCCGACG GGCGCCGAGC CGCTGGAGAT CAaTGATCTC 

57851 TACGA0CAGG CGGCATCGGC -GGGATACGAG TACGGGCCGT CGTTCCGGGG 

57901 CCTGCGCTCC GTGTGGCGGC ACXJGGCATAA CCTGCTGGCA GAGGTGGAGC 

57951 TGCCCGAACA GGCAGGTGCG CACGACGACT TCGGCATCCA CCCCGTACTG 

58001 CTGGACGCCG CGCTGCACCC GGCGCTGCTO CTCGACCAGA ACGCGCCCGG 

58051 CGAAGAGCAA GAGCCAGCCC AGCCCGCTCT TCGCCTGCCG TTCQTGTGGA 

58101 ACGGCGTCTC CCTGTGGGCC ACCGGCGCCG CGACCGTGGQ GGTACGGCTG 

58151 GCCCCX3CACG GGGGAGGGGA GACGQAOGAT AGCGCCGGGG TGCGCGTGAC 

58201 GGTCGCCGAC GCCACCGGAG CACCGGTGCT GAGCGTGGAC TCCCTGGCTC • 

58251 TGCGCCCCGC TGACCCCGAA CTGCTGCGCA CGGCCGGTCG GGCGGGCAGC 

58301 • GGGACCAACG GCTTGTTCAG <3GTGGAGTGG AOCGCTCTGC CCCCGGCGGA ' 

- 5B3S1 <:gtggcggac CACGCGGCAG. GCGACGGCTG GGCGGTGCTC GGTCAGGACG . 

58401 TACCCGACTG GGCCGGAGOG GACATGCCCC GGCATCCCGA CATQQCX!TCC 
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58451 CTGTCGGCCG CGCTGGACGA GGGAACGCAG GCCCCTGCGG 

58501 GGAGACCACA GCCACATCGC ACGCCACACC GAACACCGCA 

58551 CGCTCGACGC GTCCGGCCGG GCGGTCGCCG AGCGCACCCT 

58601 CGGGACTGGC TCGCCGAACC GCfGCCTCGCC GAGACCCGGC 

58651 CACCCACCAC GCGGTGACGA CCCCGGCGGA CGACGACGTG 

58701 CCCTCGACGT CCCGGCCGCC GCCCTGTGGG GACTGATCCG 

58751 GCCGAACACC CGGACCGCTT CGTTCTGTTG GACACCX3ACG 

58801 CACCQACCCC GGCCCCGACA dCAGTACTGA CCACAGCACC 

58851 CGTACCGAAC CGTCATCGCG CfGGGCCCTCG CCACCGGGQA 

58901 GCCGTGCGCG CGGGAGAACT GCTGGCTCCC CGCCTCGCCC 

58951 CCCCACACCC GAGACCCCCA CACCCGAGAC ACAGCCCGAC 

59001 GGTCCGAGGC CGGGGCCGGG TCCGGATCTG GACCCGGCGC 

59051 CCCGACGGCA CCGTCCTCAT CGCGGGCGGC ACCGGCATGA 

59101 CGTCGCC3GAA CACCTGGTCC GCGCCTGGTC GGTGCGGCAC 

59151 TCAQCCGGCA AGGGCCCGAC GCGCCGGACG CCCGCGACCT 

59201 CTGGTCGGCC TGGGCGCGAC GGTACGGATC GTCGCGGCCG 

59251 CGGGCGGGCC ACCGCGGACC TCGTCGCGTC GGTCGACCCG 

59301 TCACCGGTGT GATCCACGCG GCCGGCGTCC TGGACGACGC 

59351 GCGCAGACCT CCGACCAGCT GGCCAGGGTO TGGGCGGCCA 

59.401 CGCCGCCAAC CTGGACGCGG CCACGTCGGA GCTGCCGCTC 

59451 TGATGTTCTC GTCCGCCGCC GGTGTCCTCG GCAACGCQGG 

59501 TACGCGGCCG CCAACGCCTT CGTCGACGCC CTGGTCGGCC 

59551 CACCGGCCTG CCCGGCCTQT CGATCGCCTG GGGCCTGTGG 

59601 GCGCCATGAC CGGGCAGCTG GACGAGGCCG ACCTCGCGCG 

59651 <3GCGGGGTCA AGCCCCTGCT GGACGAGCAG GGCCTCGCCC 

59701 GGCGCGCGCC ACGGGGQOGC ACACCTCGCT GGTGGTOSCG 
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CCGTCTTCGT 
GCGGACGTGA 
GCACCTGCTG 
TCGTCCTCAT 
AACGCCGCAC 
CAGCGCAj^G 
CGAAGGCCAA 
GCATCGGGTA 
GCCACAGCTG 
GAGCCGCCAC 
ACCGGATCCG 
GACACTGGAC 
TGGGTGGTCT 
CTCCTGCTCG 

CGCCGACCGG 
ACCTGACGGA 
GCGCACCCGC 
CGTGGTCACC 
AGGCGTCCGT 
GGCTTGTTCC 
CCAQGCCGGT 
GCCGTCGCGC 
GCGGGCGGCA 
GCTGCGTGCC 
TCCTCGACGC 
GGCGGTATCG 
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59751 ACGTACGCGG ACTGAACAGG GACGACGTCC CCGCGATCCT CCGCGACCTG 

59801 GCGGGCCGGA CCCGCCGCAG GGCGGCCGCC GACTCCACCG TCGACCAGGC 

59851 CGCGCTGGAG- CX5GCGCCTCA CGGGCCTGGA CGAGGCCOAG CGCCGGGCTG 

59901 TCGTCACCGA CGTCGTACGC GAATGCGTGG CGGCCGTGCT CGGCCACCGG 

59951 TCGGCGGCCG ACGTACGCAC CGAGGCCAAC TTCAAOGACC TCGGCTTCGA 

60001 CTCGCTCACT GCGGTGCAGC TGCGCAACCG CCTCTCGGCG GCGAGCGGCC 

60051 TCCGCCTGCC CGCCACCCTG. GCCTTCGACC ACCCCACCCC CCAGGCGCTG 

60101 GCGGCGTACC TCGGCACGCG CCTGAGCGGC CGGACCGCCA CCCCCGTCGC 

60151 ACCCGTGGCG CCTTCCGCGG CCGCGACGGA CGAGCCGGTG GCGATCGTCG 

60201 CGATGGCCTG CAAGTACCCG GGTGGAGCGA CCTCGCCGGA AOGCCTCTQa 

ft 

60251 GACCTGGTCG CGGAGGQCGT GGACGCGGTC GGCGCCTTCC CGACGGGCCG 

60301 CGGCTGGGAC CTCGAACGGC TCTTCCACCC CGACCCGGAC CACCCCGGCA 

60351 CGAGTTACGC CGACGAAGGG GCCTTCCTTC CTGACGCGGG CGAfTTCGAT 

60401 GCGGCGTTCT TCGGGATCAA TCCGCGGGAG GCQCTGGCGA TQGATCCGCA 

. 60451 GCAGCGGCTG TTGCTGGAGG CGTCGTGGGA GGTGTTGGAG CGTGCGGGTA 

60501 TCGACCCGAC GACGCTCAAG GGCACCCCGA CCGGCACGTA GGTCGGCGTG 

60551 ATGTACCACG ACTACGCGGC AGGCCTCGCC CAGQACGCCC AACTGGAGGQ 

60601 CTACTCCATO CTCGCCGGCT CCGGCAGCGT GGTQTCCGGC CGCGTCGCCT 

60651 ACACCCTGGG GCTTGAGGOT CCTGCGGTGA CGGTCGACAC CGCQTGCTCC 

60701 TCGTCCCTGG TCTCCATCCA CCTGGCCGCG CAAGCACTGC GACAGGGCGA 

60751 GTGCACTCTC GCCCTCGCGG GC?6GC0TGAC CGTCATGGCC ACGCCCGAGO 

60801 TGTTCACCGG ATTCTCGCGC CAGCGCGGCC TGGCCCCCGA CGQCXIGCTGC 

60851 AAGCCGTTCG CCGCCGCCGC CGACGGCACC GGCTGGGGCG AGGGTGTCGG 

60901 TGTGTTGTTG CTCGAGCGGT TGTCGGATGC GCGGCX3TCAT GGGCGTCQGG 

€0951 TGTTGGGTGT GGTGCGGGGT TCGQCGGTGA ATCAGGACGG TGCGAOTAAT 

6 10 01 <3GGTTGAaGG CX3CCX3AATGG TCCGTOGCAG GMOGOQTGA TTCGTCAGGC' 
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61051 


GTTGGCCAGT 


GGTGGGTTGT CGTCGGTGGA 


TGTTGATGTG 


GTGGAGGGGC 


61101 


ATOGGACOGG 


GACCACGTTG GGTGATCCGA TCGAGGCGCA GGCTCTGCTG 


61151 


GCCACGTATG 


GGCAGGGGCG TCCGGTGGAT 


CGTCCGTTGT 


GGTTGGGGTC 


61201 


GGTGAAGTCG 


AATATTGGTC ATACGCAGGC 


GGCTGCGGGT 


GTTGCGGGTG 


61251 


TCATCAAGAT 


GGTGATGGCG ATGCGGCAT6 


GT6TGGTGCC 


GGCGAGTTTG 


'61301 


CATGTGGATG 


TGCCGTCGCC GCATGTGGAG 


TGGGATTCGG 


GTGCGGTGCG 


61351 


GTTGGCGGTT 


GAGTCGGTGC CATGQCCGGA GGTGQAGGGT 


CGTCCQCGTC 


61401 


GGGCGGGTGT 


GTCGTCGTTC dGGGCTTCGG GAACGAATGC GCACGTGATC 


61451 


OTGGAC^CTG 


TGCCCGATGG GCTGGGGGAG GACTCGGTAT CGGTCAGTGQ 


61501 


TGAGGCTCCC 


GAGACTGAGA CTGACGGGCG 


CTTGGTGCCG 


TGGGTGGTAT 


61551 


CGGCCCGCAG 


GCCGCAGGCC CTGCGCGACC 


AGGCACTACG 


CCTGCGTGAT 


61601 


GCGGTGGCGG 


CCGACTCAAC GGTGTCGGTG 


CAGGATGTGG 


GCTGGTCGCT 


61651 


GCTGAAGACG 


CGTGCGCTGT TCGAGCAGCG 


GGCGGTGGTG 


GTGGGGCGTG . 

• 


61701 


AGAGGGCTGA 


ACTCCTGTCG GGGCTTGCTG 


TGTTGGCCGC 


TGGCGAGGAG 


61751 


CACCCGGCTG 


TGACGCGGTC CCGTGAGGAC 


GGGGTTGCTG 


CGAGCGGTGC 


61801 


TGTGGTGTGG 


CTGTTCAGTG GTCAGGGCAG 


TCAGTTGGTC 


GGTATGGGTG 


61851 


CTGGTTTGTA 


TGAGCGGTTC CCGGTGTTTG 


CGGCTGCGTT 


TGATGAGGTG 


61901 


TGCGOCCTGT 


TGGAGGGGCC GTTGGGCGTG GAGGCGGGTG GGTTGCGGGA 


61951 


GGTGGTGTTC 


CGTGGCCCGA GGGAGCGGTT GGATCACACG ATGTGGGCGC 


62001 


AGGCGGGGTT 


GTTTGCGCTG CAGGTGGGGT 


XGGCCCGGTT 


GTGGGAGTCG 


62051 


OTCGGGGTGC 


GGCOGGATQT GGTGCTCGGG 


CATTCGATCG 


GTGAGATCGC 


62101 


GGCCGCGCAT 


GTGGCGGGGG TCTTTGATCT 


GGCGGATGCC 


TGTCGGGTGG 


62151 


TGGGGGCGCG 


GGCCCGTTTG ATOGOTGOGC 


TGCCTGAGGG 


CGGGGCGATG 


62201 


TGCGCGGTGC 


AGGCCACGCC CGCCGAGCTG 


GCCGCCGACG 


TGGACGACTC 


62251 


•TGGTGTGAGT 


GTGGCGGGGG TCAACAGACC 


TGATTC<3ACG 


GTGATTTCAG 


62301 


GGCCGTCTGG 


TGAGGTGGAT CGGATTGCTG -GGGTGTGGCG <3GAGCX3TGGG 
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62351 CGTAAGACGA AGGCGCTGAG CGTCAGTCAT GCCTTCCACT CGGCGTTGAT 
62401 GGAGCCGATG CTCGCGGAGT TCACCGAAGC GATACGAGAG GTCAAGTTCA 
62451 CGCGGCGGAA GGTGTCGTTG ATCAGCAACG- TCTCXGGTCT GGAGGCGGGT 
62501 GAGGAGATCG CGTCCCCGGA GTACTGGGCA CGCCATGTAC GCCAGACAGT 
62551 GCTCTTCCAG CCCGGCATCG CCCAAGTGGC TTCCACGGCA GGCGTGTTTG 
62601 TCGAGCTCGG CCCCGGCCCC GTACTGACTA CTGCCGCCCA GCACACCCTG 
• 62651 GACGACGTAA CCGATAGGCA TGGCCCCX5AA CCGGTACTGG TGTCCTCGCT 
62701 GGCCGGTGAG CGTCCTGAGG AGTCGGCGTT CGTGOMGCG ATGGCTCGTC 
62751 TGCATACCGC TGGTGTTOCX GTGGACTGGT CGGTGTTGTT CGCGGGTGAT. 
62801 CGTGTGCCTG GGCTGGTGGA GTTGCCQACG TATQCGTTCC AOCGGGAGCG 
62851 GTTCTGGTTO AGCGGCCX3TT CTGGGGGTGG GGATGCGGCQ ACTTTC3GGTC 
62901 TGGTQGCGGC GGGGCATCCG TTQTTGGGTG CGGCGGTOQA GTTCQCGGAC 
62951 CGGGQTGGGT GTCTGCTGAC CGGTCGGCTG TCGCGGTCTG .GGGTOTCGTQ 
-.63001 GCTTGCTGAT CATQTGGTGG CGGQTQCQGT TTTGGTGCCG GGTGCTGOST 
63051 TGQTGGAOTG GGCGTTGCGO GCCGGTGATQ AGQTCGQTTG TQTGACGGTG 
63101 - GftGGAGTTGA TGTTGCAGGC GCCTTTGGTQ GTGCCTQAGG • CGTCGGGTCT 
63151 GCGGGTTCAG GTGOTGGTCX3 AQQROaCGGQ TGAfiOACGGG CQGCKKSaGTO 
63201 TCCASATCTA TAGCCGGCCT QACGCGGACG . CCGTGAGCGG ..CGACOftCTCG • 
63251 TGaATCrGCC ACGCJGACCGG CRCCCrCRCX CCCOAGCACA CCGAC6CTCC 
63301 GAACGACGQA CTGGCCGGCG CGTGGCCCGC GGCGGGOGCC GTGCCGGTGG 
63351 ACCTGQCXJGG CTTCTMGftG CGCGTGGCGG ACGCGGGCTA TGCGTACGGC 
63401 CCGGGGTTCC AGGGQCTGCG. TGCCGTQTQQ CGGCACGGTC AGGACCTGCT . 
• 63451 GGCCGAGGTC GTCCTGCCCXS AAGCCGCGGG TGCCCATGAfi GQCTACGGCA 
63501 TCCACCCCGC CJCTCCTCGAC GGCAOCCTCC ACCCGQCCCT GCTCCTCGftC 
€3551 TGGCCCGGGG AGGTGCAGGA CGACGAOGGG AAGGTGTGGC TGGCTTTCAC 
63601 CR3GftAOC»<3 GTCTCCTTOC GGGCTCOQGG. AGCCGCCSW:C GTACGCGTAC 
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63651 


GTCTCTCGCC 


CGGCGAGCAC GACGAGGCGG 


AACGGGAAGT 


ACAGGTACTG 


•63701 


GTGGCCGACG 


CCACCGGGAC CGACGTCCTG 


AGCGTGGGGT 


CGGTGACGTT 


63751 


GCGTCCCGCC 


GACATCCGGC AACTGCAGGC 


CGTGCCGGGT 


CACGACGACG 


63801 


GTCTGTTCTC 


GGTGGACTGG ACGCCGCTGC 


CGCTGTCGCG 


GACGGATGTG 


63851 


TqGCAGACGG ATGCCGACQG .GGATGCCGAC 


T60GTT6TGC 


TCTCGGACGG 


63901 


TGTCGGCAGC 


CTGGCTGATG TGGTGTCGGC 


GGCGGGTGGT 


GAAGCGCd&T 


63951 


GGGCAGTGGT 


CGCTCCCGTC* GGTGCATCCG 


CGGGCGGCGG 


CCTTGCCGGC 


64001 


TTTGACCGCC' 


GTGAGGGTCT TGACGGTCGG 


CTGGTCGTGG 


AGCGGGTGTT 


64051 


GTCAGTCGTA 


CAGGAGTTCe TGOCCGCGCC 


GGAGCTGGCC 


GAGTCCCGGC 


64101 


TCCTCGTGCT 


GACCCGCGGC GCCGTGGCGA 


CCGGCGGCGA 


CGGCGACGGT 

• 


64151 


GATGTGGACG 


CGTCCGCCGC AGCCGTATGG 


GGCCTGGTCC 


GCAGTGCTCA 


64201 


GTCCGAGAAC 


CCGGGCCGCT TCATCCTGCT CGACGTGQAC ATGGACGTGG 


64251 


ACGTCGACGX 


GGACATGGAC GT6GACGTCG 


ACGTGGACGT 


CGACGTGGAC 


64301 


GTGGACGGAG 


ACGGCAATGG CAGCGACCTG GACCOGGACC 


TGAACGGCCG 


64351 


ACGACTTCCC 


CACGCCACCC TGCGTCACGC- 


CGCCGAGGAA 


CTCGACGAGC 


64401 


CCCAACTCGC 


CCTGCGCGAC GGACAACTGC 


TCGTTCCGCG 


GCTGGTCCGC 


64451 


GCCACCGGCG 


GCGGACTCGT CGTGGCGCCC 


ACCGACCGTG 


CCTGGCGCCT 


64501 


GGACAAGGGA AGCGCCGAGA CGCTGGAGAG CGTCGCGCCG 


GTCGCGTACC 


64551 


CCGGAGTCAT GGAACCCCTG OGCCCCGGCC AGGTCCGCCT CGGCATCCAC 


646.01 


GCCGCGGiSCA TCAACTTCCG CGACGTCCTG 


GTCAGCCTCG 


GCATGGTGCC 



■ t 



64651 CGGCCAGOTC . GGCCTGGGCG OCQAAGGCOC CGGTGTCGTG ACGGAGACAG 

64701 GCCCCGATGT CACCCACCTG TCGGTCGGCG ACCGCGTGAT GGGCGTCCTC 

64751 CACGGCTCCT TCGGCCCGAC GGCCGTGGCG GACACCCGCA TGGTCGCGCC 

64801 GGTTCCGCAG GGCTGGGACA TGCGGCAGOC 'GGCGGCGATG CCC.GTOGCGT 

64851 ATCTGACGGC TTGGTACGGG TTGGTGGAGC TGGCCGGTCT GAAGGCGGGC 

64901 GAGCGGQTGC TGATCCACGC AGCtACGGGT GGTGTOGGAA TGGCGGCGGT 
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64951 QCAGATCGCC CGTCACCTGG GTGCCX3AGGT GTTCGCCACC GCCAGTGCAG 

65001 CCAAGCACGT CGTACTGGAA GAGATGGGCA TCGACGCCX3C CCACCGCGCC 

65051 TCCTCCCGGG ACCTCGCCTT CGAGGACACC TTCCGGCAGG CCACCGACGG 

65101 GCGCGGCATG GACQTCGTCC TCAACAGCCT GACCGGCGAG TTCATCGACG 

65151 CATCTCTGCG GTTGCTCGGC GACGGCGGCC GGTTCCTGGA GATGGGCAAG 

652 Ql ACCGATGTGC GCACGCCGGA GQAGGTGGCC GCGGAGTACC CGGGTGTdkC 

65251 CTACACCGTG TACGACCTCG TCACCGACGC GGGGCCGGAT CGCATCGCGG 

■ 

65301 TCATGATGAG TGAGCTGQGC (^AGGTTCG CTTCCGGTGC CCTTGACCCT 

65351 CTGCCGGTGC GTTCCTGGCC GCTGGACAAG GCGCGTGAGG CGTTCCGGTT " 

65401 CATGAGTCAG GCCAAGCACA CCQGCAAACT CGTACTCGAC GTGCCCGCAC 

65451 CGCTCGACCC CGACGGGACC GTCCTGATCA CCGGAGGCAC GGGGGCGCTG 

65501 GGGCAGGTCG TGGCCGAGCA TCTG6TGCGQ GAGTGGGGCG TACGGCACCT 

65551 GGTGCTGGCC AGCCGCO^TO QACTGGACGC CCCCGQCAGC GGTGAACTCG 

65601 CCGACAGGCT GTCGGACTTG GGCGCCGAGG TGACCGTCGC GGCGGCCGAT 

65651 GTGAGCGACC CGGCCTCGGT GGTGGAaCTG QTCGGC2AAGA CGGATCCCTC 

65701 GCATCCGTTG ACGGGTGTCG TGCACGCGGC GGGCGTGCTT GAGGACGGGA. 
•65751 • TCGTGACGGC TCAGACGCCT GAGGGGCTGG CGCGGGTGTG GGCGGCCAAG 

65801 GCCGCTGCGG CGGCGAATCT CCATGAGGCG ACCCGQGAGA TOCGTCTCGG 

65851 TCTGTTCGTG GTGTTCTCCT CGGCGGCCGC CACGCTCGGC AGTCCGGGCC 

65901 AGGCCAACTA. CGCGGCTQCC AATGCCTATT GTGACGdGCT GATGCAGCGC . 

65951 CGACGGGCGG CGGGCCAGGT CGGCCTGTCG GTCGGCTGGG GTCTCTGGGA 

66001 GGCACCGGAC GCCAAGCCGG GTGTTGCCGC CGACGCCAAA CCGGATGTTG 

66051 CCGCCGACGC CAAGACGGGA GTTGCCGCCG ACGGCACTCC CCAGGGCATG 

66101 ACCGGCACCC TGAGCGGCAC CGACG1X3GCC CGCATGGCAC GCATCGGGGT 

66151 CAAGGCGATG ACCAGCGCAC ACGQTCTCGC CCTGCTCGAC GCCGCACACC 
66201- -.GCCACGGCCG C<XICX:ACCTC GTCGCCGTCG AOCTCGACAG CCGCGTCCTG 
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66251 GCGCACAAAC CCGCCCCGGC CCTCCCCGCC GTCCTGCGCG CCTTCGCCGG . 

66301 AGACCAGGQA. GGCCAGGGAG GCGGCCGAGG CGGCGGTCGG GGCGGCGGCC 

66351 CGGCACGACC GGCGGCCGCC ACCACCCGGC AGAACGTCGA CTGGGCCGCG 

66401 AAGCTCTCCG TCCTGACAQC CGAGGAACAG CACCGCACCC TCCTCGACCT 

66451 GGTACGGACG CACGCGGCAG CCGTCCTCGG GCACGCGGGC ACCGACGCCQ 

66501 TACGCGCCGA CGCCGCCTTC CAGGATCTCG GCTTCGACTC CCTGACCG^G 

. 66551 GTCGAACTGC GCAACCGCCT CTCXXICCTCC ACCGGCCTGC GCCTGCCCGC 

66601 CACGTTCATC TTCCGGCACC CGACCCCGTC GGCGATCGCC GACGAACTGC 

66651 GCGCACAGCT GGCCCCCGCX3 GGGGCCGACC CGQCCGCGCC GCTCTTCGGT 

66701 GAACTGGACA AGCTGGAGAC GGTGATCACG GGGCACGCGC ACGACGAGAG 

66751 CACCCGGACC CGCCTGGCGQ CACGCCTGCA GAACCTGCTG TGGCGCCTGG 

66801 ACGACACTTC .GGCCCGCTCG OACCACGCGG CCGGCGCGAG CGACGCCGAC 

66851 GGCGACOCCG TCGAGAACCG AGACCTCGAG TCCGCGTCGG' ACGACGAGCT 

66901 CTTCGAGCTG ATCGACCGAG AACTGCCTTC TTGATCAGGA GTGGAGAA6A 

66951 CAXGCCGGGT ACGAACGACA TGCCGGGTAC CGAGGACAAG CTCCX3CCACT 

67001 ACCTGAAGCG AGTGACCGCG GATCTCGGAC AGACCCGTCA GCGCCTGCGC- 

67051 GACGTGGAGG AGCGCCAGCG GGAACCGATC GCCATCGTCX3 CGATGGCCTG 

67101 CCGCTACCCG GGCGGGGTGG CCTCCCCCGA GCAGCTGTGG GACCTGGTCG 

67151 CCTCACGCGG CGACGCCATC GAGGAGTTCC CCGCCGACCG CGGCTGGGAC 

67201 GTGGCGGGCC TCWCCACCC CGACCCGGAC CACCCCGGCA CGACCTATGT. 

67251 ACGAGAGGCC GGATTCCTGC GGGAOGCCGC CCGCTTCGAC GCCGACTTCT 
67301 TCGGCATCAA CCGGCGCGAG GCGCTCGCCG CCGACCCGCA GCAACGGGTG ■ • • 
67351 CTCCTCQAAG TGTCGTGGGA ACTGTTCGAG CGGGCGGGCA TCGACCCCGC 
67401 CACGCTCAAG GACACCCTCA CCGGCGTGTA CGCGGGGGTG TCCAGCCAGG 

. .6-7451 ACCACATGTC CGGG^lGCGGG GTCCCGGCGG AGGTCGAGGG CTACGCCACC 
67501 AGGGGAACCC TCTCCAGCQT CATCTCCGGC CGCATCGCCT ACACCTTCGG 
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67551 CCTGGAGGGC CCQGCGGTGA CGCTCC3ACAC GGCGTGCTCG GCATCGCTGG 
67601 TCGCGATCCA CCTCGCCTGC CAG6CCCTGC GCCAGGGCOA CTGCGGCCTG 
67651 GCGGTGGCGG GAGGCGTGAC CGTACTGTCC ACGCCGACGG CGTTCGTGGA 
67701 GTTCTCACGC CAGCGCGGAC TCGCACCGGA CGGCCGCTGC AAGCCGTTCG ■ - 
67751 CCGAGGCCGC CGACGGCACC GGATTCTCCQ AGGGCGTCGQ CCTGATCCTC 
67801 CTGGAACGCC TCTCCGACGC CCGCCGCAAC GGACATCAAG TACTCGGCbT 
67851 CGTACGCGGA TCGGCCGTCA ACCAGGACGG CGCGAGCAAC GGCCTGACCG 
67901 CCCCGAACGA CGTCGCCCAG GAACGCGTGA TCCGCCAGGC CCTGACCAAC 
67951 GCCCGCGTCA CCCCGQACQC CGTCGACGCC GTGGAGGCAC ACGQCACCGG - 
68001 CACCACGCTC GGCGACCCGA TCGAGGGGAA CGGACTCCTC GCGACGTACG 
68051 GAAAGGACCG CCCCGCCGAC CGGCGGCTQT GGCTCGGCTC TGTGAAGTCG • 
68101 AACATCGGCC ACACGCAGGC GGCTGCGGGC GTCGCAGGCG TCATCAAGAT 
68151 GGTGATGGCG ATGCGCCACG GCGAGCTGCC CGCCTCCCTG CACATCGACC 
,68201 GGCCCACGCC CCACGTGGAC TGGGAGGGCG GGGOAQTGCG GTTGCTCACC 
68251- QATCCCGTGC COTGGCCACG GGCCGACCGC CCCCGCCGCG CGGGGGTCTC 
68301 CTCCTTCGGC ATCAGCGGCA- CCAACGCCCA CCTGATCGTG GAACAGGCCC 
68351 CCGCCCCGCC CGACACGGCC GACGACGCCC CGGAAGGCGC CGCAACGCCC 
68401 GGCGCTTCCG ACGGCCTCGT GGTGCCGTGG GTGOTGTCGG CCCGTAGTCC • 
68451 GCAGGCCCTG CGTCATCAGG CCCTGCGTCT GCGCGACTTT GCCGGTGACO 
68501 GGTCCCGAGC GCCGGTCACC <3ACGTGGGCT GGTGTTTGCT GCGGTCGCGT • " 
68551 GCGCTGTTCG AGCAGCGGGC GGTGGTGGOQ GGGCQTGAGA GGGCTGAACT 
• 68-601 QCTGGCGGGG CTGGCTGCGT TGGCCGCTGG TGAGGAGCAC CCGGCTGTGA 
68651 CGCGGTCCCG TGAGGAW3CG GCGGTTGCTG CGAGCGGTGA TGTGGTGTGG 
68701 CTGTTCAGTG GTCAGGGCAG TCAGTTGGTC GGTATGGGTG CTGGTTTaTA 
68751 TGAGCQGTTC CXX5GTGTTTG COGCTOCGTT TGATGAGGTG TGCGGCTTGC 
68801 TGGAGGGGGA <3CTGGGGGTT GGTTCGGGTG GGTTGCGGGA GGTGGTGTTC 
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68851 TGGGGCCCGC GGGAGCGGTT GGATCACACG GTGTGGGCGC AGGCGGGGTT 

68901 GTTTGCGTTG CAGGTGGGGT !I?GGCCCGGTT GTGGGAGTCG QTCGGGGTGC 

6895'r GGCCGGATGT GGTGCTCGGG CATTCGATCG GTGAGATCGC GGCCGCGCAT 

69001 GT6GCGGGGG TCTTTQATCT GGCGGATGCG TQTCGGGTGG TGGGGGCGCG 

69051 GGCGCGTTTG ATGGGTGGGT TGCCTGAGGG TGGGGCGATG TGTGCGGTGC 

69101 AGQCCACGCC CQCCGAQCTG GCCGCGGATG TGGATGGCTC GTCCGTG&T 

69151 GTGGCGGCGG TCAACACACC TGACTCGACG GTGATTTCAfi • GTCCGTCGGG 

69201* TGAGGTGGAT CGGATTGCTG GGGTGTGGCG GGAGCGTGGG CGTAAGACGA 

69251 AGGCGCTGAG CGTGAGTCAT GCTTTCGATT CGGCQTTGAT GOAGCCGATG 

69301 CTCX3GGGAGT TCACGGAAGC GATACGAGGG GTCAAGTTCA GGCAGCCGTC 

69351 GATCCCGCTC ATOAGCAATG TCTCCGGAGA GCGGGCCGGC GAGGAGATCA 

69401 CATCCCCGGA GTACTGGGCG AGGCATGTAC GCCAGACAGT GCTCTTCCAG 

69451 CCCGGCGTCG CCCAAGTGGC CGCTGAGGCA CGCGCGTTCG TCGAACTCGG 

69501 CCCCGGCCCC OTACTGACCG CCGCCGCCCA GCACACCCTC GACCACATGA 

69551 CCGAGCCGGA AGOCCCCGAQ CCGQTCGTCA CCGCGTCCCT CCACCCCGAC 

69601 CGGCCGGACQ ACGTWCCTT-CGCGCAeGCC ATGGCCOACC TCCACGTCGC 

6^651 CGGTATCAGC GTGGACTGGT CGGCGTACTT CCCTGACGAC CCCGCCCCCC 

69701 GCACCGTCOA CCTGCCCACC TACGCCTTCC AGGGGCGGCO CTTCTGGCTG ' 

69751 GCGGAGATCG CGGCGCCCGA GGCCGTGTCC TCGACGGACG GTGAGGAGGC 

698 Dl CGGGTTCTQG GCCGCCGTCG AAGGTGCGGA CTTCCAGGCG CTCTGCGACA 

69851 CCCTGCACCT C31AGGACGAC GAGCACCGCG CGGCTCTGQA GACQGTGTTC 

69901 CCCQCGGTCT CCGCGTGGCG GCGCGAACGA .CGTGAGCGGT CGATCGTCGA 

69951 TGCCTGGCGG TACCGGGTCG ACTGGCGGCG CGTOGAGCTG CCGACACCCG 
70001 TTCGGGGCGC CGGTACGGGT CCCGACGCOS ACACGGGCXIT CGGGGCGTGG 

» 

70051 CTGATGGTGG. CTCCCftCGCA -CGGGTCGGGT ACTTGGCC3GC AAGCCTGTGC 
70101 CCGGGCGTTG <3AGGAGGCGG GCGCGCXX3GT ACGTATCGTC GAQGCCGGCC 
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70151 CGCACGCCGA CCGGGCGGAC ATGGCGGI^CC TGGTCCAGGC ATGGCGGGCA 

70201 AGCTGTGCQG ACGACACCAC .CCAGCTCGGA GGAGTGCTCT CCCTGCTGGC 

70251 TCTCGCCGAG GCACCGGCCA CCAGTTCCGA CACCACTTCC CACACCAGTA 

70301 CCAGTTGCGG TACCGGCTCT CTCGCGTCCC ACGGCCTCAC CGGCACCTTG 

70351 ACGCTGCTGC ACGGTCTGCT GQATQCGGGC QTCQAAGCGC CTCTCTGGTG 

70401 TGCCACGCGC GGCGCCGTGT CGTGCGOCGA CGCC6ATCCG CTCGTCTcicc 

70451 CGTCGCAGGC • CCCGGTCTGG QQACTCGQAC QCGTGGCCGC CQTGGAG.CAT 

70501 CCGGAGTTGT GGGGCGGCCT ^TCGACCTG CCCGCCGACC CGGAGTCGCT 

70551 CGACGCGAGC GCGTTGTATQ CGGTTCTGCG CGGAGACGGC GGCGAGGATC 

70601 AGGTCGCGCT GCGCCGGGGC GCGGTCCTCG GCCGTCQCCT GGTGCCCGAC 

70651 GCAACCCCGG ACGTGGCCCC CGGCTCGTCC CCGGACGTGT CCGGAGGCGC 

70701 AGCCCATGCC GACGCGACCT CCGGGGAGTQ QCAQCCGCAT GOTQCCGTCC 

70751 TCGTCACCGG AGGCGTCGGC CACCTGGCCG ATCAGGTCGT ACGGTGGCTC 

70801 GCCQCGTCCG GCGCCX3AACA CGTCGTACTC CTGGACACGG GCCCCGCCAA 

70851 CAGCCGTGGT CCCGGCCGGA ACGACGACCT CGCCGCGGAA GCCGCCGAAC 

70901 ACGGCACCGA GCTGACGGTC CTGCGGTCCC TGAGCGAGCT GACAGACGTA 

70951 TCCGTACGTC CCATACGGAC CQTCATCCAC ACATCGCTGC CCGGCX3AGCT • 

71001 CGCGCCGCTG GCCQAGGTCA CCCCCGACGC GCTCGGCGCG GCCGTGTCCG 

71051 CCGCCGCGCG GCTGAGCGAA CTCCCCGQCA TCGGQTCAGT GGAGACCGTG 

71101 CTGTTCTTCT CCTCCGTGAC GGCTTCGCTC GGCAGTAGGG AGCACGGCGC 

71151 . . GTACGCCOCC GCCAACGCCT ACCTCGACGC CCTGGCGCAA CGGGCCGGTG 

71201 CCGATGCTQC GAGCGCCCQG ACGGTCTCGG TCGGOTGGGQ CATCTGOGAT 

71251 CTGCCX3GACG ACGGTGACGT GGCACX3CGGC GCCGCCGGGC TGTCCCGGAG 

71301 GCAGGGACTC CCGCCGCTOG AACCGCAGTT , GGCGCTCGGC GCCCTOCGCG 

■ 

71351 CGGGGCTCGA CGGGGGCAAG -GGGCACACGC, TGGTCGCCGA CATCGAGTGG 

71401 GAGCGGTTCG GGCCGCTGTT CACGCTGGCC AGGCCCACCC GGCTGCTCGA 
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71451 


CGGGATCCCC 


GCGGCCCAGC 


GGGTCCTCGA CGCCTCCTCG GAGAGCGCCG 


71501 


AGGCCTGGGA 


GAACGCCTCG 


GCCCTCCGTC GCGAACTGAC GGCGCTGCCC 


71551 


GTGCGGGI^C 


GGACCGGGGC 


ACTTCTCGAC CTGGTCCGCA AACAGGTGGC 


71601 


CGCCGTCCTG CGCTACGAGC CGGGCCAMA* CGTGGCGCCC GAGAAQQCCT 


71651 


TCAAGGACCT 


GGGCTTCGAC 


TCGCTCGTGG TCGTGGAGCT GCGCAACCGG 


71701 


CTGCGCGCCG 


CCACCGGGCT 


ir 

CCGGCTGCCC GCCACCCTGG TCTACGACTA 


71751 


CCCCACACCC 


CGCACCCTCG 


CCGCACACCT GCTGGACAGG GTGCTGCCCG 


71801 


ACGGCX3GCGC 


GGCAGAGCTC 


CCCGTGGCCG CCCACCTGGA CGACCTGGAG 


71851 


GCGGCCCTCA 


CCQACCTGCC 


GGCCGACGAC CCCCGGCGCA AGGGCCTGGT 


71901 


CCGGCGTCTA CAGACGCTGC' TGTGGAAGCA GCCCGACGCC ATGGGGGCGG 


71951 


CGGGCCCCGC 


CGACGAGGAG 


(SIGCAAGCCG CGCCCGAGGA CCTGTCGACC 


72001 


GCGAGCGCCG 


ACGACATGTT 


CGCCCTGATC GACCGGGAGT GGGGCACGCQ 


72051 


GTGAGCGGGG 


TGGAGCGGGG 


TGTGGGGTCG GCGGGCCCTG TGGAACAGGG 


72101 


TGACGGACTC 


GCGGGCGTGG 


TCGAGCGGGC CGAGGCQCTG GCCGCTCTGC ' 


72151 


GGQGCGCCTT 


CGACGGCTCC 


CCGGGCACCG GCGGCAGCCT CGTCGTGCTC 


•72201 


AGCGGCGCGG 


TGGGCACCGG 


caaGaccgco ctoctacggg cgtgggccga 


72251 


CCGCATCGGC 


GCCGATGCCG 


ACGCCCTGGT CCTGACCGCC ACCGCCTGCC * 


72301 


GCGCCGAGCG 


C6ACCTGCCG 


cttggcgtcc tggaacagct ggtacgcagc 


72351 


CCCGGCCTGC 


CCCCGGCCAG CGCCGAGCGC GCQCTGOCGT GGTGGGACGA 


72401 


GGAGGCCTCG 


GCCACCCCCG 


GAAAGACGGA CGCGAACGGG ACGAGTGCCA ♦ 


72451 


ACQGOACGGA CGCCAACGGG ACGGGCSGC3GG GACAGACGGG CGCGGGGCAG 


72501 


GCGGGCGTGG 


GACAGACGGG 


CGTGGGCGGA. GAGCCCGTCC TGGCCGCCTC 


72551 


CX3CCCTGCGA GGCCTGTGCG AGGTGCTGCG GGACC^GCTC GCCGAGCGGC 


72601 


CCGTCGTGGT 


CGCCGTCGAC 


GAOGCGCACC ATGCCGACGC GGCGTCGCTC 


72651 


CAGTGCCTGC 


TCTCCGTGGT 


GCGGCGGCTG OGGTOGGCAC GACTCCATGT 


72701 


GCTGTTCACC GAGTACGCCC ATCAGAftGQC GCAGAAOGCC CTGCTGAGCA 
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72751 


GCGAGTTCCT 


GCACGAGCCC 

• 


GCCGTGCGGC 


GGATCCGCCT 


GGAACCGCTG 


72801 


TCOAAGGCGG 


GCGTGGAGGC 


CTTGCTCGCC 


CGGCACCTCG 


ACGAGCGGAC 


72851 


GGCACAAGAC 


CTCACCCCCG 


TCGTCCACGG 


CATGAGCGCG 


GGCCACCCGC 


72901 


TCCTCGTACG 


GGGGCTGGCC 


GAGGACCACC 


GTGCGGCGGG 


CGGCGCCGGG 


72951 


GAGGCGTACG 


GTCGTGCCGT 


CCTCAGCTTT 


CTGTACCGGC 


ACGAGACTCC 


73001 


GGTCACCCAA 


GTCGCCCGCG 


CCATCGCTGC 


GTTGOGCGCG 


CACGCCGGAC 


73051 


CCGGTCAGGT 


CGGGCGGCTG 


CTCGATGTCG 


ACGCGGCGTC 


CGTCGAGCGG 


73101 


GCCGTGCGGC 


AGCTGACCGT 


CGCGGAGGTG 


CTGCACGAGO 


GCCGCCTGTG 


73151 


CCACCCGGCG 


TTCGCGGCGG 


CGGTCCTGGA 


CGGCATGCCG 


CCCGAGGAAC 


73201 


GCCGCGCCCT 


GCACGGACGG 


GTCGCCGACC 


TCCTGCACGA GGAGGGGGCG 

* 


73251 


CCGGCCACCG 


AAGTGGCCGC 


CCACCTCGTC 


GCCGCCGACC 


GGTCCGACGC 


73301 


CCCGTGGGCG 


GTACCCGTCT 


TCCAGGAAGC 


GGCCCAACTC 


GCCCTGGACG 


73351 


AGGACCAGGT 


GGAGACCGGC 


GTCGACTATC 


TGCGCGCGGC 


CCACCAGCGG 


73401 


TGCCGGGGCG 


CCGCGCAGCG 


TGCCGCGGTG 


GTCGGTGCGC 


TCGCCGACGC 


73451 


CGAGTGGCGG 


CTCGACCCAG 


CAAAGGTCCT 


GCGCCACCTG 


CCCGACCCTG 


73501 


CAQCCATGGC 


CCCACAAACG 


GACCCTGCCG 


CCCTGQCCCC 


ACACAGGGAC 


73551 


CCCGCACCCA 


CAGCCGCACC 


CACAGCCGCC 


CCCACCCCCA 


CCCCCATCCC 


73601 


GACCACCCCA 


CCCCTCCCCA CCCACCTGCT 


CTGGCACGGG 


CGGGTCGAGG 


73651 


AAGGGCTGGA 


CGCCATCGQC 


ACGCTCACCG 


GGCCCGGACC 


CAAGCCGGCG 


73701 


GGTGCGCCGC 


CGATGAACCC 


CGGGQACCTG 


GACACCCGAT 


GGCTGTGGGG 


73751 


CGCCTACCTC 


TATCCCGGGC ACX3TCAAGGA GCGCCTGQGA TCCQGCGCCC 


73801 


TGTCCCCGCA 


GCGCTCGACC 


CCGCCGGCGG 


TCACGCCGGA GCTCCAAGGC 


73851 


GCGGGCACGC 


TGATGAACGA CCTGCTGCAC 


GGCGGC3GAAC 


GCGACGCCAC 


73901 


OGAGGCCX5CC 


GAGCGCGCCC 


TCAACCGCTA 


CCGGCTCGGC 


CCCOGCACCA 


73951 


TCGGGGTGCA 


GAGGGCOSCG 


CTGGCGGCCC 


TCACCTACGG 


CGACCGGCCG 


74001 


CACCGCGCGG 


CCGCCTGGTG 


CGACGGCCTC QTCGCCCAGG 


CCGACGAGCG 
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74051 CAACAGCCCC ACCTGGCGC3G CCCTGTTCAC CGCGTGGCGT GCCCTGCTCC 
74101 ACCTGCGGCA GGGCGACCCG GCCGCAGCGG .AACAGCGCGC CGAAACCGCC 
74151 CTCGCCCTGC TCGGATCGAA GGGCTGGGGC GCCGCGATCG GCCTGGCGCT • 
74201 GGCAGCCGCC GTACAGGCCA' AGGCGGCCCT CGGCGATGTC GACGGGGCGG 
74251 CGGCCCTCCT GGAACGGCCC GTGCCCCAGG CGGTCTTCCA GACCCGCACC 
74301 GGACTQCACT ACCTGGCGQC CCGGGGCCX3C TATCACCTCG CCACCGGCTG 
74351 CCACTACGCC GCACTGTGCG ACTTCTACGC CTGCGGGACC CGCATGAGCA 
74401 GCTGGGGAGT GGACCTGCCC GCGCTGGAGC CGTGGCGCCT CGGCGCGGCG 
74451 GAAGCGTACC TGGCCCTCGG CGAAGGACTC CTGGCACGCC AACTCGTCGA ' 
74501 CGGCCAGCTG CCGTTGCGCA CGCCTGACQA CGGCCGCACC TGGGGCATGA 
74551 CGTTGCGCCT GCGGGCGGCC ACGTCCCCCG CGCCGGCCCG GGCCGAACTC 
74601 CTCGACGAGG CCGTGGCGGT GCTCCGGGAG AGCGGCGACA CCTTCGAGCT 
74651 GGCGCGGGCC GTCGCCGACC AGGCTGTTGC CGTACGCGAA GGGGGCGAGG 
74701 CGGAACGCGC CCGGCTGCTG GCCCGCAAGG CGGAGCTGCT GGCCCGGCGC 
74751 TGGGGCAGCG CCCCCQCGCC CGCCACCGTC CCCGAACCGC CGGAGCGGCC 
74801 AGGACCGGCC ACTCCGGACG CCGAACTGAC CAGTQCGGAG CGGAGGGTOQ 
74851 CCGAGCTGGC CGCCGAAGGG TTCACCAACC GGGAGATCTC CCGGAAGCTG. 
74901 TGCGTCACGG TCAGCACCGT GGAACAGCAC CTGACCCX3GA TCTACCGGAA 
74951 -GCTCGACGTC AGGCGACTGG ACCTCCAGGC AGCCCTCGGC TGACCTTCAG 
75001 • GCGGCCCTCG GCTGACCGCA <3GCCACGCGC CTACGGTCAG CCTTCCTGAG 
75051 TCAGGACCGT ACAGCCGCCG TAGGTGTAGG TQTAGGCGTQ GGCGAGATCG 
75101 TCGCCGCGTC CAGACCCACC ACGGCCAGCT CCTCCGGAAG GAACGGGGGA 
75151 GCGGTCAGCT CCGGGAGGCG TTCGTCG6CG CGCATCGCCA TCAGGAAACG 
75201 GTT<3GAGCCC AGTTCGGCCT GGGGCGCGTT GAGGCTCATC ACGTC5CQTGA 
75255L OGATCTCGGA OGCCTTCGGG <3AACGGATCG ACGCCGCGGT <3ATGGCCTCG ' 
75301 <KX3AACCGCA GACGCTGGTC GGTCTCCACA GCQATC^AGCC QCGGATCCGT 
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75351 CGCCGAGACA CGGCAGTTGA CGTAGTCGAT GTCCTTGGTC GCGGCGAGGA 

75401 TCCACGGGTC GTCCACGGCC GCGCCGATCG CCXTCTGCAG GGCGCGGGTG 

75451 CCGGCGCGGG CGGACCCCGT ACCCTCCTQC ACGCTCCGCT CGAACTCGCG 

75501 QTCGATCGTG GTGGCOCAGC GCQCGGCCGA GCTCATGCCG TGGCCGTAGA ' 

75551 TCGGGTTGAA AGCGGTCAGC GAGTCX3CCGA TGACGAGCAG ACQGTCGGGC 

75601 CACTGTTCGA GGCGCTCCGQ ATAGAGGCGQ CGGTTGQCQC CGQAGCgSsA 

75651 ACCGAAGACG GGGGTGAGTG GTTCGGCGTC CCGGAGCAGQ TCGGCGAGGA 

75701 TCGGGTGGTT CAGGTTCTCG GCGAAGGGGA TGAACTCGTC CTCGTGTGTG 

75751 GGCAGTTGCG CGCCCCGCGT GCAGQAGAQC GTCGCQAGCC AGCGGCCGCC 

75801 CTCGATGGGG TAGACCAGGC CGAAGCGGCC GGGTTCGCGC ACCCGGTCGT 

■ 

75851 CGGCGGCGAT GTTCACGGCG GGGAAGTGCG TCGTAGCGCC CGGCGGGGCC 

75901 TTGAAGAGCC GGGTGGCGTA GGCGACGCCC GCGTCCACGA CGTCTTCCTC 

75951 CAGTGCCGGC ACGCCGAGGG CGGCGAGCCA CTGCTTGAGG CGGGAGCCGC 

76001 GCCCGQTGGC GTCQATCACC AGGTCGGCCT CXIAGCTGCTC CTGGCGACCG 

76051 CTGTCGAGGT CGCGGACGAC GACACCGGTG ACCCGGCCGC CACTGCCACC . 

76101 ACCACTTCCC GTCAGCTCGA CGGCCTCGGT GCGCTQCCGG ACGGTGATGT 

76151 TGTCGGCTCG CAAGGCCTGC TQACGTACCG TCAAGTCCAG CAGCGGGCGG 

76201 CTGGCGACCA GCGCGAACXG GGTGGCGGGG AAGCGGTGCT GCCACCCCTG 

76251 AGCGGTCAGC GTCACCAGGT CCTCGGGGAA GCCGAGGCGG CGGGCGCCG6 

76301 . CCGCGAGGAG GCGGTCGGTG GTGCCGGGCA GCATCTCCTC GATGAGGCGG 

76351 GCGCCGTTGG ACCACAGGAG GTGGGCGTGG CGGQCCTGCG GGACCCCCTT 

76401 GCGGTGCTGG GGCTCCTCGG GCAGCGCGTC ACGTTCCACG ACGGTGACGG 

76451 CGTCGAC6TG CCGGGCCAGG ACGTGGQCCG ^CAGGGTQCC TGCCATGCTG 

76501 GCACCGAGGA CGACGGCATG TGCGGGTCGG GTGGTGGTCA CGCGCGTATC 

7^551 CCTTCGGGGT <3GGTGGTGTC GGCGGGCCCG GCGGGATCGT CCATGGTCAC • 

76601 GTCCGTGACG CCCCAQAACG <X:TGGAC0CG GGGGCCX3AGC CCGTGCTCOT - 
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76651 CGAGTTCGAC GATGCCGACG ATGCGGAAGG TCATCGGCCG CGGCCGCTGC 

76701 ACGGTGACCG TGGTCGGCGT CACCACGAAA CGGTCGTCCA TCGACGTCAT 

76751 . CGGCGGGTCC GGGACCTCGT GCGTACCGCA GGAGACGGCC AGTTGGAGAT 

76801 GGCGGCGGAG ATCGTCCTTG CCCACCATCG GGGGCCGCCC CACGGGGTCC* 

76851 TCGAAGACGA TGTCGTCCGT QAACAGGTCG. AGOACGCCTT CGATGTCACC 

76901 GGCGTTGATG CGCTCGGCGT AGTCGACGGC CATCTGCTTG CGCGCGGCfcT 

76951 CGTCGGGCAT GGCACCTCCA GGAAGGGTGG GCAGACCTTG TGAAAGTCAT 

77001 CGAGGGCCGT TCGGTTCAGC CGAGGACCGT GAGATCGGAT GTGCCCCAGT 

77051 ACOACTTCAG ATGCGGGATG AGGCCGGACG CGTCGATGCG GATCACGAGC 

77101 ATCGCCGTGC GGTGTATGCG GGCCGTCCCC GGGGCGTCGG GGGCCTTGAG 

77151 CCAGCCCCGC TCCGCGTAGA GCGGGCCCAC GGGCAGGTAG" TCCATGACGG 

77201 AGGAAATCTG GATCAGCGCG TGCGTOGCGT CCTGCCCGGC GACGGGCTCG 

77251 GCCGCCTCCT CGCGCAGGTG CGCGGCGAGC AGCGGTTCGT AGTGGGCGCG 

77301 CAGCGCGTCQ TGCGCGGTQA CGGGCGGGAQ GCCGACCGGQ TCCTCGAGGA 

77351 CCGCGTCGGG CGCGTACAGA TCGATGATCG CGTCCAGGTC CCCGGCGTTG 

77401 ATCCGCCGGC TGTGCTCCAG GGCCCGCTTC TTGCGGGCGA ACTCGTTCAT 

77451 CGCTGCCCCT CCACTGCCTG ACCGTGTCCG TTGCCGTTGC CGTTGCCX3TT 

77501 GCCGTTGCCG TGTCCGTTGC CCTGCCCGGT GGGCTGTCCQ TTGCCCTGTC 

77551 CGCTCGCGCC GTCCCTOCCG AGGTCCCGGT CGATGAACGC GAAGATCTCG 

77601 TCCGCCGACG CGTCCTGGAT ACGTGTACGA GTGGCCAGCG GGACCTCGCC 

77651 GGCOGTOTCC TGCGGOGCGT CGAGCCTGGC CAGCGTOHCa CGCAGCCGCC 

77701 CCGCCAGTTC GGCCCGCGCC GAGCCGTCCT TCGAGGAGAC CGAGAGCAGC 

77751 GAGTCCTCGA TGCGCTCGAA CTCCGCCAGG ACGTCGGCGA GCGGATCGGC 

77801 CGCGCGCGGG GCCAGCTCCT GCCGCAGCTG CGCQGOQAGC TCCGCCGGGT 

77851- -EOGGATGGTC GAAGACGAAC <3TGGCGGGCA. GCTTCAGCCC CGTCGGGGCC 

77901 GAGAGCGGGT TGCGCAGCTC CACCGCGGTC AGGGAGTCGA AGCCGAGTTC 
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77951 CCGCAGCCCC TGCGTCGCGT TGACGGGCGT GGCCGCGTCG TAGCCGAGGA 

78001 CGGCCGCOAT ATGGQTGCAC ACCAGGTCGA GCAGCGCCTC CTCCCGCTCG 

78051 GGGTCGGACA TCGCGCCGAG CGACTTGAGC AGCGCGGCCG CCCCCGCCGA 

78101 CACGGCACCG CCQCCGCTCT TGCTCCCCCC GCGCACCAGG . TCGCGCAGCA 

78151 GCGCCGGTGC GGGGTGGCTC TGGGCCTGCC GGCGCATCCG QGCCAGGTCC 

78201 AGACX3GACCG GCGCGTACAG GGGCAGTCCG CCGGCCCACG CCQCQTCGAQ 

78251 GAGGGCGAGT CCCTCGTCQG CGCCGAGCCC GACCACGCCG GCGCGGGCAT 

78301 GGCGCGCCCG GTCGGCGTCG GTGAGCCGTC CCGACATGCC GCTCGCCAGC 

78351 TCCCAGTAGC CCCACGCCAQ GGAGGTCGCC GCCGCACCGC CGTCGTGCCG 

78401 GTGCCGGGCC AGCGCGTCCA AGAAGGCGTT GGCGGCCGTG TAGCTGCCCT 

78451 GGCCGGGGCC GCCGAGCA6C CCGGCGACCG AGGAGTACAG GACQAACGCG 

78501 GACAGGTCCG CGTCCCGCGT CAGCTCGTGC AGGTGCCACG CGGCGTCCXIC 

78551 CTTCACGCX3C ATCACCTCCT CGACCTGCTC GGCCGTGAGQ TTCTGCACCA 

78601 CGGCGTCGTT CACGGTGCCC GCGCAGTGGA AGACGGCGGT CAGCGGGTGG 

78651 TCCGAGGQCA CCX3CCGCGAG GAGGGCGGCG GCTTGGTCCC GGTCGCCCGG 

78701 GTCGCACGCG GOGAAGGTGA CTCGCGCGCC GAGOGCGGAG AGGTCGGOGG 

78751 • CCAGTTCGAG TGCGCCCGGC GCGTCGGCTC CCCGCCTGCT GGACAGCAAC 

78801 AGGTGCCTGG CTCCGTACCQ TTdCACCAGG TGACGGGCCG TCAGCGAGCC 

78851 GAGXGCTCCG GTGCCGCCGG TGACCAGCAC GGTaCCCTCO' GGGTCOAAGO 

78901 .CGGGAGGCAG CGAGAACACG GTCGTGCCGG CCGAGGGCGG GGCCGCCATC 

78951 GCGGCGGGCG CCTQCCGQAT GTCCCACACG GTGATGTCGA GCGGCCTCAG 

79001 AGCACGGCTG TCACCCCGTT CCGOGQG.CAG CCCGGGCTCC GCCGACTCCG 

79051 TGATCTCGGC AAGCTCGGTC AGCTCX30TCA GCTCCGCGAQ GATTTCCCXST 

79101 ACGCGCCCGG "GCTOGGGCGG CACGACAGCG ,T6TCCCTCGT CCGGACGACC 

79151 GCGCGCACGG TQQACCAGCA GGGCGCCCTC GTGGCGGAGG GTGACGTCGG 

79201 • CCGCGGGCTC <5GCX5CCCGAA TCATCCGCCG TOOACGCACC • OTCCACGGCC 

-61- 



SUBSTITUTE SHEET (RULE 26) 



wo 01/68867 PCT/GBOO/02072 

79251 TCGACCCGCC ACCGCCCGGC AAGAGCAAGA CGCAGCACGG CACGGCCGAC 
79301 GGAACCGGTT TCCTCCCCGA CGAQCAGAGT CTCGCCGCCG CGCGGCGCCA 
79351 CGACATCCGC CAGCACGTGA TACGCGGACA CATAGGCCCC CAAGGACCCG 
79401 GCCGCCTGCG COCAACTCCA GCCCGCCGGA ACCGGCATAA GCAGCGCGGC 
79451 ATCGGTGACG GCCACCGGGC CCACCGCGTC GAACAACCCC ATCACCCGGT 
79501 CGCCCACGGC CACCGAACCG ACCTCGCCGC CGACTTCCGT CACCACACCG 
79551 GCACCCTCGA CCTGGCCCGC CGTGAGCGGC CCCGGCGCCG CGGCCCGCAC 
79601 * CGCCACCCGC ACCTCGTGGG GCTCCAGCGC CCGTCCGGCC TCGGGAGCGT 
79651 CGACAAGGGA CAACTGCTGT CCGCCGCCCG CCTCTTGGCA CCGAGCCAGC 
79701 CGCCACGTGA GCGATCCGAC CGGCGGCACC AGCCGCACCG ACGCGTCGTC 
79751 GCGCACOAGC CGTGGCACGT AGGCGC3GCCC GTCACGCAGC GCCAATTCCG 
79801 GTTCGCCGGA GGCCAGTACG CCGGTCAGCG TGGCCGGAGA AGACTCCAGT 
79851 CCGTCCACGT CQAGCAGGGT GAGGCGACCG GGATTCTCGG CCTGCGCGCT 
79901 GCGCACCAGA CCCGACAGCG ACGCX3CCCGC CAGATCACCG GCGGTCTCAC ■ 
79951 CCGOCCGCGC GGCGACCGCG CCTCGGGTGA CGACGACGAG ACGGGTCGCC 
80001 GCGAACGCCG GGTCGTCCAC CCACTCCTTG AGCAGCGACA GAAGGGACAC 
80051 GGTGGCCAGC CGCGCGTACC CGGCCGGGTC GCCGCCCCTG CCATCGGCAT 
80101 CCGCAACGGC CCCGGCACCT GCGCCGGGCG' CGGCX5CACAC GGGGAGCACG 

• 80151 ACATCGGGCG CTTCGCeCCC • AGCCGCCACT CCGTCCCGGA GCGCACCGAA 
80201 CGTGTCCCAC ACGGGGCCGG CGGCCAGCGC ATCGGACAAG GCGTCGGCCA 
80251 GCGCACCGGC CGACGTACCG CCCATCGGGC CACTCTCGAC CGGCGCGAGG 

• 80301 ACCGCGGCAC GCGGGGCGCC GCCGCCCGTC TCCTCGGCCC .GCGCGGCGAC ■ 
80351 CTCCATCCAC ACGAGCCGGA ACAGCGCGTC ACGGTCCGCC GCACQGGCGC 
80401 'CCGCGATCTG GTGGGCGGCC. ACCGGCCGTA CCGTGAGCGA CTCCAGCGTG 
80451 AOAACCGGCT CCCCGCCTCC GCCC<XK3TCC ACGGCOGTGA. GGGCCAGCTG 
80501 GTCGQGCGCG GTGCGTCCGA TACGTACCCG CAACTTCTCA GCGCCCQQCQ 
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80551 CGTGCACCCG CAACCCGCTC CAGGAGAACG GCAGCfiGCAC TTGGTCGGTG 

80601 TCGGCGGACG ACGTGACCGC GTCCAGQATC AGCGCGTQCA GCGTGGCGTC 

• 80651 QAGCAACACC GGGTGCACCT GGTAGCGGTC GGCCCTGCCG CTCTCCGCCT 

80701 CGGGCAGCOC CACCTCGGCG AAAAGGTCGT CCCCGAGCCG CCACGCGCTC 

80751 ACCAGTCCCT GTGAGCCGGG CCC6AAGTCA TAGCCGTACG AAGCGAGTTC 

80801 CCCGTACGGA TCCTGCTCOC CGACCGGTGT GGCGCCCGGG GGCGGCCACG 

8Q851 TCCCGCCGAA CQAGGCGTCC CCGGCGTCGG GCCCCGGGGG AGCGACCACG 

80901 ' CCCGCGGCAT GCCGGGTCCA CACGGCCTCC TCGCCCTCAC CCGTGGGCCG 

80951 CGAATGGACG GTCACGGGAC GCCGCCCGTC CTCGGCCACG GAACCGACCA 

81001 CCACCTGCAC GTCGACCGCG CCCGCACCCT CGTCCCCGAA GGCGAGCGGA 

81051 ' GTGTGCAGCG TCAGCTCCGC CAACTCCGCG CAGCCGGCCC G'CACCGCGGC 

81101 CTGCAGCGCG AGCTCCACGA ACGCCGAACC GGGCAGCAGC ACCGTGTCCA 

81151 TGACCOGGTG CTCGGCCAGC CACGCCTGGT CCCGCGGAGA GATCCGGCCG 

81201 GTCAGCAGGT GACTGCCGCC GTCCGCGAGT TCCACGGCGG CTCCQAGCAG ■ 

81251 CGGATGCCCC GCGOACGCGA GCCCGAQCCC CGCCGGGTCC CCGGCGAGCC . 

81301 CCCTGCGCCC CTCCAGCCAG AACCGCTCCC GCTGGAAGGC GTACGTCGGC 

81351 A6ATCCACCA CCCGAGGCAG CGGCACGGCC GGGAACCAGC CCGTCCAGTC • 

81401 GACCTCCGCC CCCGCGCCGA AGGCCTGGGC GGCCGCGCGG : GTGAGCTGCG 

■ 81451 CGGCGTCGCC GTGGTCGCGG CGCAGGGTGG GCACGACGGT GGCGGGCATG 

81501 TCGGCCCGCT CGATGGTCTC CTCCATGCCG AGGTTGAGGA CGGGGTGGGG ' 

81551 GCTGGCCTCG ATGAACAGGC GGTAGCCGTC GQCCAGCAGC GCTTCGATGG 
81601 TGTCGGCGAA GCQGACQGGC TGGCGGAGGT . TGGTGACCCA GTAATCCGTG 
81651 TCGAGGGTQG TGGTGTCGTC OAGGCGTTCO GCGGTGACCG TGGAGTAGAA 
.8.1701 GGCGACGTCC GTGGTCGTGG GCGGGATGTC GGCCAGGCGC TCGGTGAGGA 
81751 <3GTCGTGaAG CTGGTCGATC 1K3GGQGCCSGT GGGAGGOGTA TCCGACGTCG 
8ia01 ATGACGCGGG CGOGCAGGCC TCGCGCCTCC GCATGOGCGA CCAGGGCTGC 
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8185X CACATGCTCC OaCGGCCCTQ AAATQACCX3T AGAGGAGGGC CCGTTGACGG 
81901 CAGCGACACA CACGCCGGGC CGGTCGCCGR. TSAGCTCAGC AACCTGCTCC 
81951 GAGCCGGCCC CCAACGACGC CATQTCX3CCC TGCCCCATGA GCTGACGGAG 
82001 CGCGTCACTG CX3TACGQCCA CGATCCGCGC CQCATCCTCC AGTGACAGTG 
82051 CCCCCGCCAC ACACGCGGCG GCCATCTCGC CCTGCGAGTG CCCGATQACG- ■ 
82101 QCRGCCGGQG TQATQCCGTA ATCXSGCCCAC ACCGAAGCCa. GCGAGACc5vT 
82151 CAqCGCCCAC AACACGGQCT GCACGACCTC GACCCGGGAC AGCTCACTCC 
82201 CGTCCCCGCQ CAACACCGCA CTCASCGACC AGTCCACATG CGCCGACAGG 
82251 GCCCGCTCAC ACTCCGCGAT CCGCQCCQCG AA6ACGGGGG ACTCGTC3^ 
82301 GAGCTCGGCA CCCATGCCCA CCCACTGCGA CCCCTGCCCC GGAAACACCA 
82351 ACACCGGACC CGCGCCGGAQ QCGCCCTGTA CGGCGCCCTC GACGACGTCC 
82401 GGTGACX3GCT CGCCCGCCGC CAGGQACCGT AGCCCGGCGA GQAOAGTCTG 
82451 GCGGTCCTTO CCCACGACGA CQOCrcaGTT CTCGAACACC OACCGGGTCT 
82501 TGACCAGGGA CCAGCCCftCG TCCAQCGGCG ACGCGAQCCG CGGQTCQGCG 
82551 GTGGCGCGGT COGCCAGCAO GCGGGCCTGG GCCCOCAGCG CCTCCTC6CC 
82601 GCGCGCCGAC ACCACCCAGG -GCACCS^CTCC GGCCGGCXJCC QCGGCGTCCT . 
82651 CCGCCGGAGC GGTCACGGGC TCCGGCGCGT CCX3GGGCCTG TTCCASOATG 
82701 AGGTOCGOOT TGGTGCCQGA GATQCCQAAO GCGGACACXIC CGGCGCGGCG 
82751 CGQQCQTTCQ CCQCGOSGCC AGGAGACCQG TTCGGACflfiC AGGCGOACQC 
82801 CACTOCCQTC CCAGTCCACG TOCGGCGTQG GCGCGJCGAT GTOCAGGGRG 
82851 GCGGGCAGCT GTTCGTTGCX3 CAGCGCCATQ ACCATCTT6A TCACACCQGC • 
82901 <SACACCQQCC GACGCCTGCG CGTGCCCGAT GTTCGACTTG ATCGAGCCGA . 
• 82951 GCCACAGCGG CCGQTCCGCO GQCCGCTCCT TGCCGTAGGT GGCX3ACX3AGC 
83001 GCGCTGGCTT CGATGGGGTC GCCCAGCATG GTGCCGGTGC CGTGOGCCTC 
. 83051 CACOGCGTCG AaSTCCTOQO CGGASAGCCO CX300TTQGCG A6T6CCTGCC- 
83101 GGATCACCGO CTQCTGGGCC TGCCOGTTGG GTGCCX3TGAG CCCQTTaCTC •; ■- 
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83151 GTGCCGTCCT GGTTGATGGC CGAACCCCGG ATCACCGCCA GGACGTTGTG 

83201 GCCGTTGCGC CGGGCCTCCG AGAGCCGTTC GAQTACGACC AGGCCGACTC 

83251 CCTCGGCCCA GCCGGTGCCG TCGGCGGCGG CCGCGAACGG CTTGCACCGG 

83301 CCX3TCCTTGG CGAGCCCGCG CTGCAGCGAG AACTCGACGA ACGAGCCCGG 

83351 CGTGGCCATC ACCGTCGCGC CGCCCGCGAG AGCGAGCGAG CACTCQCCCT * 

83401 GGCGCAGCGC GTGCX3CCGCC TGGTGGATCG CCACCAGGGA CGAAGAGCAG 

83451 CCGGTGTCGA TCGTCATGGC GGGGCCTTCT AGGCCGAGTA CGTACGACAC • 

-83501 CCTGCCGGAG GCGACACAGC CGAGGTTGCC QGTGCCGATG TAGCCCTCGA 

83551 CCTCGGTGGG CTGTTCACCG ACGAGCGGGA GGTAGTCjSAA GATGGTCAGG 

83601 CCCGTGAACA CCCCGGCGTC GCTGCCCTTG AGGGTCTCCC GGTCGAGGCC ' 

83651 CGCGCOTTCG ATCGCCTCCC ACGCQGTCTC CAOGAGCAGC CGCTGCTGCG 

83701 GGTCCATCGC GACGGCCTCG CGGGGGCTGA TGCCGAAGAA TCCGGCGTCG 

83751 AAGTOGCCCQ CGTCGTAGAG GAACCCGCCT TCGCGCACAT AGCTGGTGCC 

83801 GCGGCTCTCC GGQTCCQGGT CGTACAGCGT . CTCCAGGTCC CAGCCCCGGT .* 

83851 CGTCGGGGAA GGCCCCCATG GCGTCCTTGC CGGCCGCGAC CAGATCCCAC 

.83901 AGCTCCTCGG CGGAGCGGAC GTCGCCCGGA TAGGGGCAGG CCATGCCGAC 

83951 GATCGCGATC GGCTCGTCGT CGGCGGCGCC CCTGGAGGCC CCGGCCGCCC 

84001 QCACCGGQTC GGCGGAGGCC GCCGCGTCAC CX3GACAGCTC GGCCCGCAGG 

84051 ACGTCGGTGA GCX3CGTCGGG <3GTGGGGTGG TCGAAGACGA CCGTGGTCGG 

84101 .CAGTGTCAGG CCGGTGCTCT -TGTTCAGCCT GTTGCGCAGC TCCACCGCGG 

84151 TCAGCGAGTC GAAGCCCAGC TCCTGGAACG GCTTGGTGQC GGGCACCGCG 

84201- TCGACGTCCG AGTGCCCCAG CGTGGCCGCC GCCTGGGAGC GCACGTGCTG 

84251 CAGCAGCAAC TOCCGCTGCT GCGCCGGCTT OGCCTCCGTC AGCTCCTGCT 

84301 <3GAGCGACGA TGCCTCCGTG GCGTCTTCCT GCTGTGCCGC GGGTGCGCTG 

84351 GCCOGCCGGT TCTCGGGCAG ATCGGCGAGG AGGGGGCTGG GCCGCTGCGC 

* 84401 GGTGAACGTC GACGTGAACT- GCGCCCAGTC GAAGTTCGCC ACGGTCAGCG 
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84451 TCX3TCTCACC CGCGTCCAGG GCCTGCTGCA GCGCCTTGAC GCACAGCTCC 

84501 GGGCTGAGC6 GGTGCAQGCC GAAGCGGCTG AAGAACGTCA ACGCGGCCTG 

84551 GTCCGCCGCC ATGCCCGCCT CGGCCCAGGG CCCCCAG6CG ATGGAGGTGG 

84601 CGGGCAGGCC CTCGGCGCGG CGGTGCTCGG CGAGGGCGTC GAGGAAGTGG 

84651 TTGGCCX3CAC CATAGGCGCC CTGCTGGCCA CTGCCCCACA 'CGCCTGCGCC 

84701 CGACGAGAAC ATCACGAACG CCGAGAGCGG CAACTCCCGG GTCA6TT<?AT ' 

84751 GCAGATGGTG AGCGGCGAGC GCCTTCGQAC GCAGCACCTC GTCCAGCTCG 

84801 GCACCCGACA CGTCGCCGAG ACCGATGTAG TTCGGCACGC CGGCCGCGT6 

.84851 GATGACGGCG GTCAGCGGGT . GCTCGGCGGG GACATCGTCG ATGAGGCGTC 

84901 GCACCTGCTC GCGGTCGCCG ACGTCGCAGG CGGTGACGGT QACGQCGGCC 

84951. CCCAACTCCG ■ TCAGTTCCGC GGCGAGTTCC TGXGCTCCCG' GGGCGTCGGG 

85001 QCCQCXSGCGQ CTGOTCAGOA GQAGGTGOQG GGCX3CCCX3CA CGGGOSAGCC 

85051 ACCGCGCGAG GACGGCGCGG ATGCCGCCGG TCGCGCCGGT GATGAGAGTG 

85101 GTGCCGTCGG GCCGCCAACC AAGCCCGCTG CCGACCGTGT TGGCGGGCGC 

85151 GTGTGCAAGG CGACGGGCAT GGACGCCGGA CGGCCGGATG GAGATCTGGT • 

85201 CCTCGTCCTG CGGAACCAGC GCGGCGGCCA GCCGGGCCAQ CGTCTGATGG 

85251 TCGATACGAG CGGGCAGATC QACCAGCCCG CCCCACAGCC QCGGATACTC 

85301 CAGCGCMCG ACGCGCCCCA GCCCCCACAC CTGAGCCTGC ACCGGGTGGG 

85351 TGAGGGCGTC GCCGGCGCTC GTGGAAACAQ CCCCCTGCGT GAGAGTGCQT 

85401 ACGGCGATGT CGGCGCCGTT GTCCGCGAGG GCCTGGACGA GAGCGGTCGT 

85451 CjSCGQCGAGT. CCGGCGOGCA CGOCCGAGTG CTCGGGATGC GGCTCCTCQT 

85501 CGAGGGCCAG CAGATTGACG ACTCCGGCAA ACGCGGCCCC GTCCATCAGG 

85551 ACACGCAGCT CCTGCGCCAA CTCCGTACGC TCCATGGCAC GTOCGTCGAC 

85601 CACGTOGC!GT CGCACCTCGC CACCATGGGC. GGTCAGCGTC TGCGCGGTCG 

• 85651 CGAGGACGGC C<3GGTGGTCG GCGTGCGCGG CGGGCACGAG CAGCAGCCAG 

85701 GCCCCX3CTCA OCTCCGGOGC CXSGCMGTCG GGCAGATOCT TC5CAAGTGAC ' 
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85751 CTGATAACGC CAGGAGTCGA CGGTGGACTG CTCGCGGTGC CGACGCCGCC 
85801 AGGCCGAGAG GACGGGCAGC GCGGACTCCA GCGCTCCGAC GCTCTCCGCC 
85851 TGCCCCTCGA TCTCCAGACT GCCGGCGAGG GCGTCGATGT CCAGGTCCTC 
85901 GATCGCCTGC CACACCCGGG CCTCGACCGG .ATCGTGCCCA CCACCCACGG 
85951 CTGCGACCGC CGCGGGCGGC TCCACCCAGT AGTGCTTGTG CTGGAAGGCG 
86001 TAGGTQGGGA GGTCGACGGT ACGGGGGGTG GGGTCGGCCG GGAACCAG'CG 
86051 CCGCGAGTCG ACGGGGGCGC CGGCGGTGAA. GGCGTGGGCG GCCGCGCGGG 
86101 TGAGCTGGGT GGTGTCACCG TOGTCGCGAC GCAGGGTGGG GATGGTGACG 
86151 GCCGTCCCCG CAGCACCGGC CTGCTGCTCG ATGGTCTCCT GGATGCCGAG 
86201 GTTGAGGACG GGGTGGGGGC TGGCCTCGAT GAACAGGCGG TAGCCGTCGG 
86251 CCAGCAGCGC TTCGATGdTG TCGGCGAAQC GGACGGGCTG GCGGAGGTTG 
86301 GTGACCCAGT AGGCGGTGTC TAGGGCGGTG GTGTCGTCQA GQCGCTCTGC 
86351 GGTGACCGTC GAGTAGAACX5 CCACGTCGGT GGTGGTCGGC TGGATGTCGG 
86401 CGAGCCGGTG GGTGAGGAGG TCGTGGAGCT GGTCGATCTG GGGACCGTOG 
86451 GAGGCGTACC TGACGTCGAT GACGCGGGCC CTGAGTCCCT GCGGCTCCGC 
86501 ATCGGCGACQ ACQGCTGCCA CATGCTCCGG CGGGCCCGAA ATCACGGTCG 
86551 ACGACGGTCC GTTGACGGCC GCGAOGACTA CGCCCGGCCG GTCGCCGATC 
86601 AGCTCTOCGG CCTQCTCGGC ACCG6TGCTG AGCGAGGCCA TGTCGCCGTG 
86651 CCCTTGCAGC TGACGAAGCG CGTCGCTGCG TACGGCTACG ATCCGTGCCQ 
86701 CATCCTCCAG TGACAGTGCC -CCCGCCACAC ACGCGGCAGC CATCTCGCCC 
86751 TGCGAGTGCC CGATGACGGC AGCCGGGQTG ATGCCGTAAT CGGCCCACAC 
86801 CGCAGCCAGC GAGACCATCA CCGCCCACAG CACGGGCTGC ACGACCTCGA 
86851 CCCGGGACAQ CTCGCTCCXX3 TCCCCGCGCA AGACATCACT CAGCGACCAG 
86901 TCCACATGCG CCGACAGCGC CTGCTCACAC TCCGCGATCC GCGCCGCGAA 
86951 GACX3GGCGAC TCX3TCAAGGA GCTGGGCX3CC CATGCCCACC CACTGCGACC , . 
87001 CCTGC<:CCGG AAACACCAAC ACCJGGCCCAG GACCGACATC ACCQGCCACC 
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87051 CCGGCCACCA CATCCGCCOA CGCCTCACCC QCGGCCAATG CCTCCAfiGCT 
B7101 GGCACCAGCC TGAGCCAAGT CCCGCCCCAC GACCACGGCC CGCTQATCGA 
87151 ACAACGCGCG TGTCGTGGCC AGCQACCAGC CCACCTCGQA QACCQACGCA 
87201 TCCQCCASCC CQGCCGCQAA CTCQCGCAGC CGCCQCGCCT GTTCACGCAA 
87251 CGCGTCCGGC GTCCGCCCGG ACACCACCCA CQOCS^CGACC CCACCCQGCT 
87301 CRGCCGCCAC GGGaCCCQQC GCGTCCTCTT CCGGCXK3CGC CTCCTCCjfeA 
87351 ATCAGGTGCG CQTTCGTCCC GGAOATCCCXS AACGCCGAGA TGCCTGCCCG • 
87401 CCGCGTGCGC TCCGCCGGCC AgTCCACGGG CTCGGAGAGC AGTCGTACGC 
87451 TGCCCTGTTC CCACTGGACQ TGCGGTGACG GGGC3GTCGAT 0TGCAGGGAG 
87501 QTCGQOftflGA GACCQTT6C6 CATCGCCATG ACCATCTTGA TQAOQCCCOC 
87551 GACACCGGCQ GCGQCCTOCG TQTGQCCQAT GTTGGATTTC ACCGAGCCGA 
87601 GCCAGAGCGG ACQGTCCTCC GGGCGCCCCT OGCOOTACGT GGCGATCRGG 
87651 GCCTGCGCCT CGATGOGGTC GCCGAGCGTG 0TGCCG6TGC CQTGCGCCTC • 
87701 TACGGCGTCG ATGTCCTCGG CGGaOAQGCG GGCGrTGOCQ AGOGCGGCGC 
877S1 GGATGACGCO TTCCTGGQAfl GGGCCGTTGG GGGCGGCQAG CCCGTTGCTC 
87801 GTACCGTCCT GGTTGGTGGC CGAACCCCOT ATCACCQCAA GGACCTTGTG 
87851 GCCXK^GGCGC CGCGCOTCGG.AGAGOVGCTC Ca^GCGCCACC.ACCXXX3GCaC 

87901 CCTCGCCCCA GCC6GTGCCQ TOGGCGGCGG- CCQCGAACGG CTTGCACCGC 
87951 CCGTCGGGCG CGAGCCCCCG CTGCCGGGAO AACTCGGTGA ACQRACCCGG 
88001 COTCGCO^TC ACCGTCGAAC CGCCCGCCAG .CGCGAGCGAG CACTCGCCCT 

88051 GCCGCAGCGC CTGACTTOCC AGATQGATCG CCRCCAGGGA CXSACSAGCftC 
88101 GCCQTGTO3A OMTGACCGC GGGAGCTTCG AGCCCCACCG TGTAGGAGAT •• 
88151 CCGGCCCGAC ACCACACTGC CGAGGTTGCC GGTQCCGATG TACCCCTCGA • 
88201 CGTCGCTGGC CGTCTGGCTO ATCAGCGTCA GGT&GTOGTQ QGCQCTCACT 
88251- OCSGTGAASA CGCX3GGTCTC GCTOGCCTTC AGOGOSTGCG GGTTCATGCC 
88301 CGCGTGCTCG ATCGCCTCCC AOQOGGTCTC -CAGQAOCAGC OGCTGCTGCQ- 
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88351 GATCCATCGC CGTGGCCTCG -CGCGGGCTGA TGCCGAAGAA CTCGGCGTCG 

88401 AAATGOCCGG CGTCGTACAG GAAGQGGCCG TCCCGCACAT AGCTGGTGGC 

88451 CGGATGCTCC GGATCCGGGT GATACAGCGA - CTCCAGGTCC CAGCCCCGGT ' 

88501 CGTCGGGGAA CCCCGCGACC GCX3TCACCCC CGTCGCGCAC GAGTTCCCAG 

88551 AGGTCCTCCG CCGACCGGGC. GCCGCCCGGG TAGCGGCAGQ CCATGCCGAC 

88601 GATGGCGACC GGCTCGGTCG ATTCCTTGTC GTGGAGCCGT TGCCGGGCfCT 

88651 GGCGCAGCTC CGCGGTGACC CACTTGAQGT GATCGAGAAG CTTCTCCTCG 

88701 TTCGACATCT GACCCAGGCT CCTTGGCGCT ACGTGGTGAT CGGGGCGTAT 

88751 QAGOTTOGGG GAGGGCAAGO G6GCCGGTGT GGCCGGGGCT CATCGCGCTC 

88801 AGGACTGATC GCTGCTCAGG ACTTCCCGAA CTCACTGGAG ATGAGGTCGA 

Ik 

88851 AGATGTCGTC CGCGCTCGCC GCCTCCAGAT CGGCATGGGC CGAATCAGTG 

88901 CCTTCCGGCC CGTCCTGCGC CGGACTCCAC TTCGACACAA GGACCTGCAG ' 

88951 CCGGCCCACG ATGCGGCGCC GGGCCGCCTC GTCCACCTCG GCCGCTCCGA 

89001 ACGCCGTGTC CCACTTGTCX3 AGCGCCOCGA GCACGTCGCC CTCACCTGCG 

•89051 ACCTCGGCGC CGTCGCCGAG CTGTCCGCGC AAGTGCGTGG CGAGGGCCTC 

89101 GGGCGTGGGA TGGTCGAAGA TCACGGTGGC GGGQAGCGAG AGTCCGGTCG 

89151 TGGTGTTGAG CTGGTTGCGC AGCTGGACCG CGGTGAGCGA GTCGAAGCCC 

'89201 AGCTCCTGGA ACX3QCTTCGC GGC3GGGAATG TCCTCCACCG TGCGGCCGAG 

89251 CGTCGCGGCC GCGTATGTCG GGACCTGCTG GACCAGGAAG CCGAGCCGCT 

89301 GTGATGCGGG CGTCTTCX3CC AGCTCCTCGC GGAAGGCGCT CGTCTCGGCG 

89351 GCGOTCCCCG TCTGCTCGGC CTCCOGCTGG *TCTCGGGAA GGTCGTCGAG 

89401 GAACGGACTG GGCCGCTGCG CGGTGAACGT CGGCGTGAAC TTCGCCCAGT 

89451 CQAAGTTOGC CACGGTCAGC QTGGCGTCGC COGCGTCGAC CGCCTGGTGC 

89501 AGCGCCTTGA CGCACAGATC .CGGAGCGATC GGGAGCAGAC GGAAGC2GCTT 

89551 OAAGTACGTC AGTGACTCCG GGTCGGCGGA CATGCCfCGCC TCGGCCCAGG 

89-601 GCCCCCAGGC GATGGAGGTG GOGGGCAGGC CCTGGGCGCG GCGGTGCTCG 
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89651- GCGAGGGCGT CGAGGAAGTG GTTGGCCGCA CCATAGGCGC CCTGCTGGCC 

89 701 ACTGCCCCAC ACGCCGGOGC COGACOAGAA CATCACGAAC GCCGAGAGGT 

89751 CCAGGTCGCG CGTCAACTCG TGCAGGTTCC AGGCCGCGTC X5GACTTCGAC 

89801 CCCAGCACCT CGCCGAGGCG CGCGGTCGTC AQATCACCGA TCGCGGTCAG 

89851 ATCGGTCATG CCCGCCGCGT GGATGACGGC TGTGAGGGGA TQCTCGGCCG ' . 

89901 GCATGTCGTC GATGAGGCCG CTCAGTTGGC GGGGATCGCT GACGTCG(?AG 

89951 GCGGTGATGG TGACGGCGQT GCCGAGCCCG TCQAGCTCGO CGGCGAGTTC 

90001 CCGGGCGCCG GGCGCGTCGG GCCCGCGACG GCTGGTGAGG TGAAGACGGG 

90051 GGGCGCCCTG CCGGGCCAGC CAACQGGCGA GGACGGCACC GATGCCGCCG . 

90101 GTCCCGCCGG TGATCAGGGT GGTGCCXICGA GGCCGCCAGG TGGCCTCGCT 

90151 GTGCACGGGA TTCTGAATGC TTCCQACGQC GTGCGTQAGG CGCCGGTGGT 

90201 GGATTCCGGT GGGGCGGACG GCGGTCTGGT CCTCGTCGTC CTGGGGGAGG 

90251 AGAGCdGCGG CX3tAQGCGGGG GAGGGTGTGG CGGTCGATAC GAGCGGGGAG 

90301 GTCGACGAGT CCGGCCCAGA GGCGCGGGTG TTCGAGGGCT GCGACGCGGC 

90351 CGAGCCCCCA .GAeGTGAGCC TCGAGGGGGT GGGTGAQTGG GTCGGTGGCG 

90401 GCCGTGGACA CGGCACCCTG CGTGACGGTG TGCAGGGGTG CGGTCGTGCC 

90451 GTTGTCGCCG AGGGCCTGGA GGAGAGCGGT CGTCGCGGCG AGCCGGGCGG 

90501 GCACGGCGGG GTGCTCGGGG TGCGGCXCCT CGTCCAGCGC CAGCAGATTG 

90551 ACX5ATTCCGG CAAGACCGGC CGTGTCCACC GCGGCCAGCT CCTQACOTeC 

90601 CGCCCGGCCG GTCTCGACCG GATGCAGCCG GACGGCGGCC GCCCCGTGCT 

90651 CGCTCAACGC CTCQGCQQTQ GCTCX3TACQG CfGGGGTGCTC CGCCTTGTCG 

90701 GCAGGGACGA ACAGCAGCCA GTCGCCGCCG AGTTCCGGTG CGGGCCCGTC 

90751 GGACCGCTGT TTCCACGTQA CGCGGTACCG CCAQGAGTCG ATGGTCGCCT 

90801 GGTCCTGGTG. OCGACGCCGC CAGCCCTTGA GCACOGGCAA CGCGGGCTCC . 

' 90851 AGCGCCCX3GA CCGCXITCCTC GCTGCCCTCC TCCGACCCCA OCGTCTCGGC 

90901 CAGGA<3ACCG AGATCGAGCT CCTCGAGGGC QTGGCACAGC TGGGCCTCGG 
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90951 CCGCACTCTG CTCACCGCTG ACGGCGCCCG AGGCGGACGC GGAACGTTCG 

91001 AGCCAGTAGT GCTGGTGTTQ GAAGGCGTAG GTGGGGAGGT CGACGGTGCG 

91051 GGGGGTGGGG TCGGCCGGGA ACCAGCGCCG CCAGTCGACG GGGGCGCCGG 

91101 CGGTGAAGGC- GTGGGCGGCG GCACGGGTGA GCTGGGTGGT GTCGCCGTGG ' 

91151 TCGCGGCGGA GGGTGGGGAC GACGGTQGCG GGGATGTCCG CCTQCTCGAT 

91201 GGTCTCCTCC ATGCCCAGGC CCAGCACGGG GTGGGCGCTG GCCTCGAf&A 

91251 ACAGGOGGTA GCCGTCCGCG- AGAAGGGCTT CGATGGTGTC GGCGAACCGG 

91301 ACCGGCTGGC GGAGGTTGGT cScCCAGTAA TCCGTATCCA GGGCTGTGGT 

91351 . GTCCGTCAGA CGCTCGGCGG TGACGGTCGA ATAGAAGGCC ACGTCCGTGT • 

91401 TCGCGGGCCG GATGTCAGCC AGGCGTTCGG TCAGCAGATC GTGGAGCTGG 

91451 TCGATCTGGG GGCCATGCGA GGCGTACCCG ACGTCGATGA CACGGGCGCG 

91501 CAGACCACGT GCCTCCGCAT CGQCGACCAC GGCAGCCACA TGCTCCGGCG 

91-551 GCCCTGAAAT CACCGTAGAC GACGGCCCAT TGACCGCCGC GACGACCACG 

91601 CCCGOCCGGT . CACCGATCAG CTCAGCGGCC TGCTCGGCAC CGGTGCTCAG 

•91651 CGAGGCCATG TCACCGTGCC CTTGCAGCCG • ACGAAGCGCG TCACTGCX3TA 

91701 CGGCTAOGAT GCGCGCCGCA TCCTCCAGCG ACAGCGCCCC CGCGACGCAC. 

9X751 GCGGCAGCCA TCTCACCCTG CX3AGTGCCCG ATCACAGCAG CCGGAGTGAC 

91801 CCCGTAATCA GCCCACACCG CAGCCAGCGA GACCATCACC GCCCACAACA 

91851 CCGGCTGCAC GACCTCGACC CX3GGACAGCT CACTCCCATC CCCGCGCAAC 

91901 ACCGCACTCA GCGACCAGTC CACATACGCC GACAGGGCCC GCTCACACTC 
91951 CGCAATCCGC GCCGCGAAGA CGGGGGACTC GTCCAGCAGC TGGGCACCCA . 
92001 TGCCCACCCA CTGCGACCCG TGGCCCGGAA ACACCAACAC CGGCCCAQQA 
92051 CCCACATCAC CAGCAACCXJC GGCCACCACA CCCGCCGAAG CCTCACCCGC . 

92101 AGCCAACGCC CGCAGGCCAG COOTCAACGC ATCGOGGTCA OQCTCCACCA 
921S1 GGACAGCCCG GTGCTCGAAC ACCGACCGGG .TCGTGGTCAA CGACCAGCCC 
92201 ACATCAGCCG CCX3A0GCATC OGCOGGCCGG GCXX3CGAACT CGCCCAGCCG 
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92251 CCGCGCCTGT GCACGCAQCG CGTCCGGCGT CCGCCCGGAC ACCACCCACG 

92301 GAACGACCCC ACCCGGCTCC TCGGCCACGG AGCCCGGCAC GTCCTCCTCC ' 

92351 TCCGGTGGTG CCTCCTCCAG GATCAGATGC GCGTTCGTCC CCGAGAAGCC 

92401 GAACGAGGAC ACCCCCGCCC . GG.CGCGGGCG CTCGCCCCGG GGCCACTTCA 

92451 CCGGGTCGGT GAGCAGGCGC AGCCCGCTGC CGTCCCACTC CACGTGGGGC 

92501 GAGGGGGCGT CGACGTGCAG GATGGCGG.GC AGCAGGTCGT GCCGCAGGGG 

. 92551 CAGGACCATC TTGATGACAC CGGCCACACC GOCGGCGATC TGCGTGTGGC 

92601 CGATGTTGGA CTTCACCGCT cScACCCACA • GCGGCCGGTC CTCCGGCCGT 

92651 TCCCGGCCGT AGGCGGAOAT GAGAGCCCCG GCCTCGATGG GGTCGCCGAG 

92701 CGTGGTGCCG GTGCCGTGCG CCTCCACGGC GTCGATGTCC TCQGQGGCGA 

92751 GGCGGGCGTT GGCGAGGGCG GCGCGGATGA CGCGTTCCTG GGCGGGGCCG 

92801 TTGGGGGCGG TCAGGCCATT GCTCGCGCCG TCCTGGTTQA TCGCCX3AACC 

92851 CCGGATCACC GCGAGGACCT TGTGGCCCTT CCTGCGGGCG TCGGAGAGAC 

92901 GCTCAAGGAG AACCACCCCC GTACCCTCCG CCAT6CCCAT GCCGTCGCTG 

. 92951 CTCGCCGAGA ACGGCTTGCA CCGTCCGTCG GGGGCCAGGC CGCGCAGTTC 

93001 GCTGAAGCCG ATCAGCGGGG CGGGCGACGA CATCACGTAC GTGCCGCCCG 

93051 CCAGCGCCAG CGAGCACTCC TGTGTGCGCA GQGCCTGGGT GGCGAGGTQA 

93101 AGGGAGAGCA GCGACGAG6A GCACGCCGTG TCGACCGTCA CCQCGGGGCC. . 

93151 TTCGAGGCCC AGGQTGTAGQ CGACGCGGGC GGAGGTGACG CTGCCGGAGT 

93201 TGCCGATGGT GAAGTATCCG GCGGTGCCCT C6GGGACCTC GGACGCGCCG' 

93251 AGGGCQTAGT C!GAQTC!CGTC ACAGCOGATG AAGGTGCTGG TGTCQCTGGA 

93301 GCGGAGGCTG AGGGGGTCGA TGCCGGCCCG TTCGATCGCC TCCCACGCCG 

93351 TCTCCAGGGC GAGCCGCTGC TG.CGGCGCCA TGGCCGCGGC CTCGGTGGGT 

93401 COGATGCCGA AGAAGGTGGG GTCGAAGTCA .CCGGCGTCGT AQAGGAAGCC 

• 93451 GCCTTCCCGG ACGTAACTGG TGCCGGTGCT CTCG<3GGTCG GGGTCGTAGA 

93501 GGGAATCX3AG OTCCCAGTTG GGGTTGGGGG GCAGGQGCGC GACGGCGTCQ 

-72- 



SUBSTITUTE SHEET (RULE 26) 



wo 01/68867 PCT/GBOO/02072 

93S51 CCGCCGGTGG AGACCAGCTC CCAGAACTCT TCGGGAGACC GGACTCCGGC 
93601 GGGCAGCCGG CAQGCCATGC CGATOACCQC GACCGGTTCG TGGCCCGCCG 
93651 ACTCGACGTC CTGCAGCCGG CGTTCCGTCT GACGCAGQTC CGCGGTGACA 
■ 93701' CX3CTTGAGGT ATTCCAGAAG TTTCTCTTCG GTGTGCGCCA TCCCGGTGAC 
93751 AACCGCCCCT CTCCGCGAGA ACAGACCGCA GACTCGTCGA CGQCOCTAAA 
93801 6CCCTCCTAA TACTCGGCTG TGTACCGCTC GCTGCCACGG GTGTCCGCAC 
93851 TGGTCGGAGG CTCCGGCCCA GGGAACAGG6 GCTTTCTTAG GGGCGCTTAA 
93901 GCGGTGCCTG CCAGGGTGTG cSgGTGXCAG GCCGTCACGC CCTGATCAGC 
93951 GGCGTCGCCC GTQCCGTGCC CGTQCQQTCG GTGGGCCTGA CCGTCGGTCC 
94001 GGACAACGCG AAGCGAGGCA TCGTGCCCAT CACGGATAGC AAGCCGGCCG 
94051 CCACATTCCC CGACCTGGTC GACCCX3TCGT TCTGGGCGCG GCCGCACGCG 
94101 GAACGCGTGG CGCTGTTCGA GGAQATQCQC GGC3CTQCCX3C GGCCGGCGTT 
94151 CATCCGGCAG AACATGCCCG GCGTGCCCTG GACGTTCGQC TACCACGCGC 
94201 TGGTCAApTA CGCGGACATC GTGGAQGTGA GCCGCCGCCC GCAGGACTTC 
94251 • TCCTCGAACG GCGCGACCAC CATCATCGGT CTGCCGCCCG AGCTGGACGA 
9430.1 OTACTACGQC TCGATGATCA ACATGGACAA CCCGGAACAC TGGCGGCTGC 
94351 GGCGCATCGT CTCdCGTTCG TTCGGCCGCA ACAtGATCCC CGAGTTCGAG 
94401 GCCGTGGCOA CCCGCACCGC CCGCCGCATC ATCGACGAGC TCATCGCGCG 
944S1 • GGGACCCGGC GACTTCATCA GGCCCQTCGC CGCGGAGATG CCCATCGCCG 
94501 TGCTCAGCGA CATGATGGGC- ATCCCGGCGG AGGACGACGA CTTCCTCTTC 
94551 GACCGGTCCA ACACQATCOT CGGCCCCCTC <5ACCCGGACT ACGTGCCGGA 
94601 CCGGGCGGAC TCCGAACGGG CGGTGATCGA GGCGTCACGC GAACTCGGCG 
94651 ACTACATCGC TGGCCTTCGT ' GCGGAACGGC TCGCCGCCCC CGGCAACGAC . 
94701 CTCATCACCA AGCTCGTGCA AGTCCAGGCG <3ACGGCGAGC AQTTGACGCQ 
9475 1 OCAGQAACTC <3TCTCCTTCT TCATCCTGCT -CGTCATCGCC <5GGATGGAGA 
94601 CCACCXX3CAA CGCCATCTCG CACGGGCTGG TACTGCTGAC CGAGCATCCC 
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94851 GAGCAGAAGC AGCTGCTGCT CTCGGACTTC GACACGCACG CGCCGAACGC 

94901 GGTCGAGGAG ATCCTCAGQG TCTCCACGCC CATCAACTGG ATQCGGCGCG 

94951 TCGGCACCCG CGACTGCGAC ATGAACGGCC ACAGGTTCCG CAGGGGCGAC 

95001 CGGATCTTGC TGTTCTACTG GTCGGQCAAC CGGGACGAAT CCGTCTTCCC • 

95051 TGACCCGTAC CGGTTCGACA TCACQCGCQG GACGAACGCG CACGTCACGT 

9510.1 TCGGCGCGGT GGGCCCGCAC GTCTGCCTCG GGGCCCACCT CGCCCGTiCrG 

95151 GAQATCACCG TCCTQTACCG GGAGCTGCTC GCGQCGCTGC CCCAGATCCA • 

95201 TGCCGTGGGG CAGCCCCGCA gScTGGACTC CAGCTXCATC GAAGGGATCA 

95251 .AGCACCTGCA CTGCGCCTTC TQAQCACATA CGCTTCCCTC TGCGCATGTG 

95301 CGCTCACGAC GCTCCGATCA GCGACTGCCA ACGACXGTCA GCGACCGGAC 

95351 AGGGCCAAGG GCX3GTGGGGA CATCAGGTGC ATGTCACCCG CGAGTATGGC 

95401 CCGCTGCAGC TCCTGGAGCO GGCGCCOGGG TTCGAGCCCC AGCTCGTCGT 

95451 TGAGCGTCTT GCGCACCGAC TGGTACACCT TCAGCGCGTC CGCCTGCCGC . 

95501 TCGGAGC6GT AGAGCGCCAG CATCAGCTGG CGGTAGAACG CCTCGCACAT 

95551 C6GGTTCTCC GCGGTGAGGG CGTACAGCAT GCCCACGGCC TCGCGGTGCC 

95601 GGCCGAGCTG GAGCTGQCAC TCGACGAGCA TCTCCTGACA. CXCCAGGCGG 

95651 ATCTCGGTCA GCCAGGTCGA GAAGCCGTCG ATGATCGGGC CGTTGGTGCX: 

95701 QGGACCGTTC CCGCCCTGCC CGAGGATCGG GCCGCGCCAC AGCGCGAGCG 

95751 CCTGCCCGAA ACAGGAGGCC GCCTCGTCGA ACCGCTTCTC CCTGAGCAAC 

95801 GACGGCCCCA CGTCCACCAQ TTCGGGGAAG ATCTGGGCAT GGATCTGGTC 

95851 GTCGTCCCQC' TTGTGCAGGA CGTACCCCGG CGCACGGGTC TCGACGGGGT 

95901 TGCCCGCCGA ACCGGGCACC TTGAGGAACT TGCGGAGCTG GGAGATGTAC 

95951 ACATQCAGTC CXrOCCGTGGC GCGCCGCQGC AGQTCCTCGC CCCAGATCTC 

96001 CCGCATCAGC TGCTCCAGGG AGACCACCCG GTCGGCGCX3G ATGAGGAGCA . 

••• 

96051 CGOTGAGGAC <3ATCTCCACC TTCTGGGCGT TGATGOTGGC GTAGTCaTTT 

9610i CCOTCXiOTaA TaOQOAGCXKI QCCCMCaCTT TOOTATCTCA CCX3A0CQTK: 
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96151 CCCCTTGCTG TCGCACGCTG CTGCGCACTG TCGGCCAGGG CCTTGGAGAT 

96201 GACTTCCQTG ACGCCCTGCT GQTGCGTGTT CAQATAGAAG TGGCCCCCGG 

« 

96251 CGAAGACCTT GAGGTC5GAAG QGGCCTTCiCG TGTGCTGCTG CCAQGCCTCG 

96301 ACCTCGTCCA GCGGCGCCTG CGGGTCCCGG TCGCCCACCA GGGCGGTGAT 

96351 GGGGCAGGAC AGCGGCGGCG ACGGGTTCCA CCGGTACAGC TCGACCGCCC 

96401 GGTAGTCGTT GCGGACGACC GGGATGATCT CCGCGAGCAG TTCCTCGlfCG 

96451 TCCAGGAACC GCGGGTCAGT GCCACCGGCC CGGCGCAGCT CGGCGGCCAA 

96501 CTCGGTGTCG TCGAGGAGGT GTACGGTGCC GCGCCGGAAG CGGGACGGCG 

96551 CGCQGCGTCC CGAGACGAAC AGCCGGCAGG GCTGCTTCCC CGTGCGCTCG 

96601 CGGAGCCGCT GGGCGACTTC GTAGGCGAGG ACGGCGCCCA T6CT0TGGCC 

96651 GAAGAACGCC AACGGGCGGT CGTCGAACGG GCCGAGCGCA TCGGTGATGA 

96701 GGTCQGC3GAO TTCCCCGATG TCGTCCAGGA GCCGCTCTCT GCGGCGGTCC 

96751 TGTCGCCCGG GGTACTGCAC CGCGAGGACC TCGCTGTCGG TCGGGAGAGT 

96801 GGGGGATTGC GCAAGQGGGT .GGTAGTAGGA GGCCGAGC.CG CCCGCGTGGG 

96851 GGAAGCAGAC CAGGCGAACG ACGGCTTCCG GTCGGGGCCG GAAGCGACGT 

96901 ATCCAAGGGT CCGACATATC GGGTGGGGGG AAGGCAGACA AGATCTTTCC 

96951 CTTCGCCAGG AACGCTGACA ACGGTGTGTC GCCACATCAC ATAGCCGCTC 

97001 CTGATCATGC GCAGCTCAAA GTTTAAACGG CAACGTCGCT AACGGGGGAQ 

97051 CAGGGCGGAA TCAOACATTC CCCATCCTTT ATTCCGCGAT TCTTACGTGA 

97101 TCGAATCCCG GCGGCCAAGA TGGAGTAAAT TTCAATATGA ATGCTTAACG 

97151 CCGGACAGCT TOTACGGOSG GCCX3CCQGQG CGGTGACTGG CGTCCCTGCC 

97201 AGCCGTGATG GCCTGACGAG GCCTCCGGGA TCCATCCCCC GCCCX3CTGTC 

97251 GCCGAQTTCT TTGCX3GGATT ATTACGTTGC ATTGGTTTGC TTCGTGGCCC . 

97301 GGGCGGTTGG COTGCGCTAT TTGGCAGCCT TCCX3TCATGG GTGGTAAAAG 

973S1 ATCGCCTTTC CCCTCTGGGG * TGCCGGTGGA' GCTGGCCTCG AGCGGGATTG 

97401 TGGCTTGTTG TTraCTTGTG GCGCCGCGTG TGAAACAGCG GCAGTTGGCC 
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57451 ACTCGCTCTG ACAGOCTCCG GGGACGGGGT TGTCACCTTT TGGGGTGACT 
97501 GGCCTCGTTC AAGGCQTCCT GQCCCGTGGT GCATCCGCGA TCGTCGTGCC . 
97551' ATGGGTGAAG TGQGAAGGAG CACAGAACGA TGAGCGAGAG CATGGCGTGG 
97601 C3TGACGCQGG ACGTCCGCAA. GGCCCGCAAG GAGGGCAGTG . CGGGGACCGC ' • 
97651 QCGGCGCCGA GCCGACCGGC TGGCQGACCT GGTCGCCCAC GCCCGCTCGG 
97701 CGTCGCCGTA CTACCGGGAG CTCTACCACG GCCTGCCCGA GCGGATCCSVG 
97751 GACCCOACGC TGCTQCCGGT OACGGACAAG AAGCAGCTGA TGGACCACTT 
97801 CGACGACTGG CCGACGGACC gSgACATCAC CTTCGAGAAG GTCCGCGCGT 
97851 TCACCQACQA CCCCQAGCTQ ATCGGGCGGC GCTTCCTCGG CCGCTATCTG 
97901 GTGGCCACCA CGTCGGGCAC CAGCGGCAGG CGCGQCCTGT TCGTGCTCGA 
97951 CGACCGOTAC ATQAACGTOT CCTCCGCCGT CTCCTCCCGG GTGCTCGCCT 
98001 CCTGGCTCGG CCCCCTCGGC ATCGCCCGGG CCGTOGTCCA CGGCGGCCGC 
98051 TTCGCCCAAC TC6TCGCCAC CGAGGGACAT TACGTCGGCT TCGCCGGATA 
98101 CTCCCGCCTG CGCCAGGACG GCGGAGCGC2G CAGCAAGCTC GTCCGCGCGT 
98151 TCTCTGTGCA CGAGCCGATG TCACGTCTGG TCGCCGAACT CAACGAGTAC 
982 Ql CGGCCCGCGT TCGTCATCGG CTACGCCAGT ACGATCATGC TGCTCACCGC 
98251 CGAACAGGAA GCGGGCCGGC TGCACATCGA CCCGGTGCTG GTCGAGCCCQ 
98301 CGGGCGAGAC GATGACCGAG AGCGACACCG ACCGCATCGC TGCGGCGTTC 
98351 GGCGCCAAGG TGCGCACQAT GTACAQCGCQ ACCGAGTGCA CCTACCTCAG 
98401 CCACGGCTGC <5CCGAGGGCT GGTACCACGT CAACGACGAC TGGGCCGTGC 
98451- TCGAACCGGT CGACGCCQAC CACCGGCCCA CCCCGCGGGG GGAGTTCTCG 
• 98501 CACACCACCC TGATCAGCAA CCTCGCCAAC CGCGTCCAGC CGTTCCTCCG 
98551 CTACGACCTQ GGCGACAGCG TCATGCTCCQ CCCCGACCCC TGCCCCTGCG 
98601 GCACCCCCTC GCCOGCGATC CGGGTOCASG GCAG0TC6GG CGAGATCCTC 
98651 . ACCTTCCCCT OGGGCCGGGG CGACGACGTC AGGGTCGCCC GGCTCGCCTT 
98701 CAGCAGCCTC TTCGAOCGCA TGCCCGGAGT OGAGCTCTTC CAGATCGAGC 
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98751 AGACCQCQCC GTCGACCCTQ CQCGTCCGCG TOGTCCAGQC GCCCGGCGCC 
98801 OACGCXSGACC ACGTGTGGCA GCGQGCX:caC GACQQGCTGA CCCACCTCCT 
98851 CGCCGACAAC AAQCTCGACA ACGTAACCGT CGAACGGGQC OAQQAGCCGC 
98901 CGCGGCAGGC ATCCGQCGGC AAGTACCX3GA CGATCATCCC GCTCXSCCGCC • 
98951 TGAACGCTCQ CCGACTAGCC GCGCGCCGCC TQAGCTGCTC TCACCGCGCG 
99001 TACGQGCGCA 6CGGAGGCTC CTCGTCQACC CACGGCTGGC TGTQGAT«G 
99051 CAGCTC6ATC GGGAASTTCA QCAGGCCGGG CAGGGC6TCG ACGGCCTCCT 
99101 GGCTGTTQAQ CGQCATGACC GGCTTGGCGC AGTGCGCQCG GTCGATGCGO 
99151 CTCXrrQTCGO CdGQACCOGQ GTGCTCGATC GCATCGGCX3A CCAGGTCGTA 
99201 GCTGACCTGQ TCGACGACCA TGGCGATQTQ GQTCGOCCAC GSCCGACCCG 
99251 aACAflATGTC CTGGACGCCQ ATQCGGTGGG CGCCCGGCAG CGACGGCGCC 
99301 TCCCCQTCGG CCACCACGGA CTCGTCGQC6 TATGAATASA TCQTGGTQTA - 
99351 COAOGGTCCT OCQGGCATGG GC6TGCCGTC GGCGCCCAGA GCCTTCGACC 
99401 AGTTCGA6TC GCGGGCGAAC TQCAGQACCG ACGCCGGGCA GCCCGCCACC 
99451 TCGGCQATCG GGCGGCAGGG CGAGGCCAGC CGGGTCCCCT GGAACGGOGA 
99501 GCCCAGQGTC ACCATGTCGT CGACCTTCCC CGGCAGGTCC GGCCAGAAGC 
99551 GCAGGGCCCA CGCCGTGAGG AGGCCGCCCT GGCTQTQCCC OACGAGATCG 
99601 ACCTTCCGGC CS3GTGGCCTC CTOGATCGOG CQGGTCGCGT ACACCACGTA 
99651 CTCGACGGAC TCCTGCATOT CAOGQAGCCC QCQACCGQQA GAATCCACCC 
99701 AACAGGACTG GTAGCCCTTC TTCTTCAACT CGGCCATGTA GTTCCAGGCG 
99751 TAGTTCTCCT CQCCXOTGAfl «COGQTCCCO GQCACGAAGA .GQACGaTCGG 
99801 CTTGTCACCG GCGTCACGCA GGTeCCCCAQ CTCCQTCCCG CAGTGCAGCG 
99851 CCTTGGCGM CTOOOCCGCC GQTATCTCCA ACGGGGGAGA GGAAACATCC 
99901 GCCGCCQAAG CGGCGGAGGC CGGftASCACG GTOGCX3GCX» GCACGGCCGC 
99951 CACOAGTCCG CCGAQCO^TG AGGACAAGGG CACGGTGACC TCCACAGGAA • 
100001 CCTTCACGAG TGAOOQGAAA CTCCCTCCGG AGGGAGCACC TCATCGTGCG 
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10005X GCX3GCGCCAC AaTAGCCGTC AACTQCCCCA CGGGGCTGA6 TAOTTGACAG 
100101 TTGOCCGGGC TCGQCOGQCO AASCXJCCCGa GCCCCQCCGC CCCGCGCCGT 
100151 OCGCGAGGGG TCCGTGACCT GGGTGGACGQ TCqGGTTGQA Ca^TCCCGQGG 
100201 OAGCCTCTGO CATGGTCGCC CGTCCGTCCC CCTCAAGAAC CGAAGOGAGC 
100251 GTCACGATCA CGATGATC6A AOTCAGCACG CGCAQC3VTGA AGGAAGCGGC 
100301 TCCCGCCQAO CAGCTCCGCO CGGAGACCAC GACACTGGAC ATTCCAAttSG 
100351 GTTTCGACCT OTGGACQGCC GACGAGATCQ CQGAGTGGCT CGACGGCGTC 
100401 GAGGACGACC CGGCAGTCTC cSaCGCCGAC TTCTACGCGQ CCCAGCAGCO 
.100451 GTGCGACQGG TCCTCGGCAC CGAGGGCACC tgacccgccg GCGGCCCTGC 
100501 GCGGCCCTAC GTOTGCAGCG CCCCGTCCTC CTCCACATOC CCCTCCGGCT 
100551 CCftflCTGOAT CQTCGASTGG GCCACQTCGA AGTGGCCCCC GACACACCGC 
100601 TGAAGGCQCC CCS^GGAGCTC CCCGTACCCG CTCGCGAGAG CCTCCTCCGT 
100651 GACO^CCACG TGCGCX3GTGA GCACCGGCAT CCCCGAGGTG ACCGTCCAQC 
100701 CGTGCAGATC GTQCACGQCO ACCACQCCCC GCTCCTCCAO CAGGTGCCGG 
100751 CGCACCTCGC CQAQGTCGAC QTCCTGCGGG GTCGCCTCCA QCAOGACQTQ 
100801 CAAGOAOTCC CGCAGCASOC CGTACGCGCG CGGCACQATC AGCAGGCCGA 
100851 TQACGATCGA CGCGATCQGG TCGGCGGCCT GCCSVCCCCQT GRflCAQGATQ 
100901 - ACCAOGCCOC CCACGATCAC CGCQACCGAO CCQAGCQCGT CGCCCAGCAC 
100951 CTCCAGGTAC GCGCCCC6CA GATTQAGGCT CTTCTCCTTG GCQTCCCGCA 
101001 GCAGCCACAG GCCCACCAGG TTCGCGGCGA GCCCGCCCAG CGCQACCACG 
101051 AACATCAGGC CGCCCTTCAC CTCCaCCXSGC TCOCTGAACC GGCCGATCGC 

i 

101101 CGACCACAOG ACCCAGGCGA AQATGACGAC CAGGAGCaGC GCX3TTCAGGA 
101151 CCGCGOAGAA GATCTCCACG CGGTAGAACC CAAAGGTQCG CCGCGGCGTC 
101201 GGCGCCCGCT GGGCGAGGGT GATGGCACCX3 RGGGCCftCOQ AQACOCCGAC- 
1012S1 GG06TCGGTC AGGCTGTGGG CQGCOTGGGC GAGCAGCGCO AGGCTGCCGG 
101301 ACAGGAGCGC QCCGACCACC TGGATGACXSG TGATCGAGC3C OCTCATGCOG 
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101351 ATGGTCCACA GCAGGCGCTT QCQGTACGTG CCGCTQAGftG TGCCGCCCGC 
101401 CGCCCCGGCG GACGGACCGT GGTCGTGCCC CATGCCCGCG AGTGOftCCAC 
101451 GGCQQCGCQG CACCC3CCAC CGAlQCGGCCG CCGGTCGQCT CAGTGCAGCC 
101501 GGGCCTGGGT GGAGOTGTCG CGCTGGTGCG GGATGCCGRG CGGCGGCGGC 
1015S1 AGCTC6CCCT OCTCCACCCT OACCGTGCGC ACGQQGGCGG GGACCCQGAT • 
101601. GCCCTCGGCG CGGTAACQCT GGTCCAGGCO CTTGATGAAC TCGTOCTTGA 
101651 TOCOGTACTG GTCX3CTGAAC TCGCCGACGC CQAGGATCAC CGTGAAGCTG 
101701 ATCCGCQAGT CGCC6AAQGT dfoGAAGCGG ATCQCCGCCT CGTGGTCGGG 
101.751 • SACXDGCGCCG 6TGATCTCGG CCATCACCTC GTCCACCACC . TOQGTCGTGA . . 
101801 CCTTCrCOAC CTOCTCCAGQ TCGCTQTCGT AGCT6ACCCC GACCTGCACC 
101851 ATGATCGACA GCTCCTGCTC GGGGCGGCTG TAOTTGOTCA TGTTGGTGCC 
101901 GaCQAOCTTC GCQTTGQaGA TGATGACGAG GTTGTTGGaG AGCTGGCGGA 
101951 CCGTGGTGTT GCGCCAGTTG ATQTCGACQA CGTAQCCCTC CTCCCCGCTO 
102001 CTQAGCTGOA TGTAGTCGCC GGGCTGCACG GTCTTCGCGG CGAGGATGTG 
102051 CACGCCCGCG AAGASATTOG CGAQCGTQTC CTGCAGTGCG AGGGCGACCQ 
102101 CGAGACCTCC CACGCCQAGG GCGGTQAQCA QOGGTGCGAT GGAGATGCCG 
102151 AGOOTCTQAA GGACOATOAG GAJVGCCXa.TC GCGAGCACCA CGACGCGGGT 
102201 GATGTTCACG AAOATGGTGG CCGATCCGGC CACTCCQGRO COGOACTGTQ 
102251 CCAC6GCCTT CACCAGGCCG GTGACX3ATCC GGGCCGCCGT GAGCGTGGCG 
102301 GCCAGGATGA GCAGCGCGGT CAGCOTCATG GTGACGTTGC GTCCGGTGCG 
102351 OGGCGTGAeC QGCAGCGCGC CCGCCGCGGC GGCGAGCCCG GCGGTQATGQ 
102401 . CCGCGCAQGG CACGMGGTO CGCAGQGCGT CGAOGATQAC OTCGTCACCG 
102451 CTCCACCGGG TTTTGCTCGC CCGTTCQCCG AGCCACCTCA GftAGTGCGCO 
102501 GAGCftGCaOC CQGOCQACOA CXKXXSOCGAC GACCGCGATA CCGGCCACGA 

• * 

102551 TCCAGTCGTQ CAGTGTGAGG GCACGGGTCA TCAOTTCGCT CCXTGICQTAC 
102601 GGGGGGASTG CGCCTOTGTG GGGCGTATCT GATQTGACGT CACCTTGTGA 
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102651 TACCTGCTCG ATTCCOaOGA GTQCQGTCAC GCCGGGACGA GAGCTCGGTT 

102701 CCGGCGCGGA CGTCATCCTG CCCCATCCGC CCACGGCAGG CGTGCATACC 

102751 CCCACCTGGA TCTTC7VCAGA CCGGCCACGT CTGTCCATGC GCCGATGAGC 

102801 GCGCTGCCCG TGGTAAAGCA TTGAGTCAGG CGATTTGGCC ACTCGGCACT 

102851 CGGCGGACCG GTCGAGCCGG TCGATCTACQ TGAGCGGAGG CGGTTGAGCA 

102901 TGGCGTCCAT GTGCAGACCC GGAATGTCAC CCGTCAATTC GCACAACCAG 

102951 TGGGATCCGC TGGAGGAGAT CATCGTCGGG CGGCTGGAGG GCGCGACCAT 

103001 TCCCTCCAGC CATCCGGTCG TOGCGTGCAA CATCCCGACC TGGGCGGCAC 

103051 GGCTGCAGGG TCTCGCCGCC GGGTTCGAGT ATCCGCAGCG GCTGATCGAG 

103101 CCGGCGCAQC AGGAGCTCX3A CCAGTTCATC GCTCTCCTGC AATCCCTCGA 

103151 CGTCACAGTG AGACGGCCGG CGGCCGTCGA CCACAAGCAC CGCTTCGGGA 

103201 CCCCCGACTG GCAGTCGCGC GGCTTCTGCA ATTCCTGTCC GCGGGACAGC 

103251 ATGCTCGTCG TCGGCGACQA GATCATCGAG ACCCCGATGG CGTGGCCGTG 

103301 CCGCTGTTTC GAGACGCACT CGTAGCGCGA ACTCCTCAAG GACTACTTCC 

103351 QGCGCGGCGC GCGCTGGACG GCGGCGCCGC GCCCCCAQCT CACCGAGQCC 

103401 CTGTACGAGA AGGACTTCCG CCCTCCCGAG GAGGGCGAAC GATGCGCTAC 

103451 ATCCTCACCG AGTTCGAGCC GGTGTTCGAC GCGGCGGATT TCGTGCGGGC 

103501 GGGCCGCGAC CTGTTCGTGA CGCGGAGCAA CGTCGCCAAC CT6CTGGGCA 

103551 TCGAGTGGCT GGGCCGCGAC CTTCGGGCCG GAGTACCGCG TGCCACGAGA 
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BUDAPEST TREATY ON THE INTERNATIONAl 
r.i:.COGNITION OF THE DEPOSIT OF MICROORGANlavIS 
FOR THE PURPOSES OF PATENT PROCEDURE 



Professor P.P. Leadlay, 
Department of Biochemistry, 
University of Cambridge, 
80 Tennis Court Road, 
Cambridge. 
CB2 IGA 



INTERNATIONAL FORM 

RECEIPT IN THE CASE OF AN ORIGINAL DEPOSIT 
issued pursuant to Rule 7.1 by the 
INTERNATIONAL DEPOSITARY AUTHORITY 
identified at the bottom of this page 



NAME AND ADDRESS 
OF DEPOSITOR 



I. IDENTmCATION OF THE MICROORGANISM 


identification reference given by tlie 


Accession number given by the 


DEPOSITOR: 


INTERNATIONAL DEPOSITARY AUTHORITY: 


Escherichia coli 




XLl-BlueMR(MO-CNll) 


MCIMB 40956 







II. SCIENTIFIC DESCRIPTION AND/OR PROPOSED TAXONOMIC DESIGNATION 



The microoiganism identified under I above was accompanied by: 
0 a scientific description 

1^ a proposed taxonomic designation 
(Marie with across whm applicable) 



III. RECEIPT AND ACCEPTANCE 



This International Depositary Authority accepts the microoreanisra identified under I above, which was rec«ved by it on 
1 July 1998 (date of the original deposit)' 



IV. RECEIPT OF REQUEST FOR CONVERSION 



The microorganism identified under I above was received by this International Depositary Authority on 

(date of the original deposit) and a request to convett the original deposit to a deposit under the Budapest Treaty was received by it on 

(date of receipt of request for conversion) 


V. INTERNATIONAL DEPOSITARY AUTHORITY 


Name: NCIMB Ltd., 


Signature(s) of person(s) having the power to represent the 
International Depositary Authority or of auttiorised 
official(s)^ 


Addrcss:23 St Machar Drive, 
Aberdeen, 
AB24 3RY, 
Scotland 


^^^M^ 1998 



1 Where Rule 6/4(d) applies, such date is the date on which the status of International Depositary Authority was acquired. 

Form BP/4 (sole page) 



PCT/GBOO/02072 



BUDAPEST TREATY ON THE INTERNATIONAL 
n^COGNITION OF THE DEPOSIT OF MICROORGANI^..iS 
FOR THE PURPOSES OF PATENT PROCEDURE 



Professor P.F, Lcadlay, 
Department of Biochemistry, 
University of Cambridge, 
SO Tennis Court Road, 
Cambridge. 
CB2 IGA 



INTERNATIONAL FORM 

RECEIPT IN THE CASE OF AN ORIGINAL DEPOSIT 
issued pursuant to Rule 7.1 by the 
INTERNATIONAL DEPOSITARY AUTHORITY 
identified at the bottom of this page 



NAME AND ADDRESS 
OF DEPOSITOR 



IDENTIFCCATION OF THE MICROORGANISM 



Identification reference given by the 
DEPOSITOR: 

Escherichia coli 
XLl-Blue MR (MO-CN33) 



Accession number given by the 
INTERNATIONAL DEPOSITARY AUTHORITY: 



NCIMB 40957 



U. SCIENTIFIC DESCRIPTION ASD/OR PROPOSED TAXONOMIC DESIGNATION 



The microorganism identified under I above was accompanied by: 



□ 



a scientific description 



IE] a proposed taxonomic designation 



(Mark with a cross where applicable) 



III. RECEIPT AND ACCEPTANCE 



This IntemationaJ Depositaiy Authority accepts the microorganism identified under I above, which was received by it on 
1 July 1998 (date of the original deposit)! 



IV. RECEIPT OF REQUEST FOR CONVERSION 



The microorganism identified under I above was received by this International Depositary Authority on 

(date of the original deposit) and a request to convert the original deposit to a deposit under the Budapest Treaty was received by it on 

(date of receipt of request for conversion) 



V. INTERNATIONAL DEPOSITARY AUTHORITY 



Name: NCIMB Ltd., Signature(s)'of person(s) having the power to represent the 

International Depositaiy Authority or of authorised 
officials}: 



Address:23 St Macbar Driven 

Aberdeen, DeS/^ July 1998 

AB24 3RY, 
Scotland. 




I Where Rule 6/4(d> applies, such dale is the date on which the status of International Depositary Authority was acquired. 

Form BP/4 (sole page) 



PCT/GBOO/02072 



BUDAPEST TREATY ON THE INTERNATIONAL 
u«.COGNITION OF THE DEPOSIT OF MlCROORGANl^.iS 
FOR THE PURPOSES OF PATENT PROCEDURE 



Professor P.F. Leadlay, 
Department of Biochemistry, 
University of Cambridge, 
80 Tennis Court Road, 
Cambridge. 
CB2 IGA 



INTERNATIONAL FORM 

RECEIPT IN THE CASE OF AN ORIGINAL DEPOSIT 
issued pursuant to Rule 7.1 by the 
INTERNATIONAL DEPOSITARY AUTHORITY 
identified at the bottom of this page 



NAME AND ADDRESS 

OF DEPOSITOR 



I 



IDENTIFICATION OF THE MICROORGANISM 



Identification reference given by the 
DEPOSITOR: 

Escherichia coli 
XLl-Bluc MR(MO-CN02) 



Accession number given by the 
INTERNATIONAL DEPOSITARY AUTHORITY: 



NCIMB 40958 



II. SCIENTIFIC DESCRIPTION AND/OR PROPOSED TAXONOMIC DESIGNATION 



The microorganism identified under I above was accompanied by: 



□ 



a scientific description 



[31 a proposed taxonomic designation 
(Mark with a cross where applicable) 



ITL RECEIPT AND ACCEPTANCE 



This International Depository Authority accepts the microorganism identified under I above, which was received by it on 
I July 1 998 (date of the original deposit) > 



IV. RECEIPT OF REQUEST FOR CONVERSION 



The microorganism identified under I above was received by this International Depositary Authority on 

(date of the original deposit) and a request lo convert the original deposit to a deposit under the Budapest Treaty was received by it on 

(date of receipt of request for conversion) 



V. INTERNATIONAL DEPOSITARY AUTHORITY 



Name: NCIMB Ltd., Signature(s) of person(s) having the power to represoit the 

Internationa! Depositary Authorj^y or of authorised 
officiaI(s)^>>' ^ ^ 



Address:23 St Machar Drive, 

Aberdeen, Datf^July 1998 

AB24 3RY, 
Scotland. 




1 Where Rule 6/4(d) applies, such date is the date on which the status of Intemattonal Dcpositaiy Authority was acquired. 

Form BP/4 (sole page) 



PCT/GBOO/02072 



BUDAPEST TREATY ON THE INTERNATIONAL 
RECOGNITION OF THE DEPOSIT OF MICROORGANISMS 
FOR THE PURPOSES OF PATENT PROCEDURE 



Dr. P.F. Leadlay, 
Department of Biochemistry, 
University of Cambridge, 
80 Tennis Court Road, 
Cambridge. 
CB2 IGA 



NAME AND ADDRESS OF THE PARTY 
TO WHOM THE VIABILITY STATEMENT 
IS ISSUED 



INTERNATIONAL FORM 

VIABILITY STATEMENT 
issued pursuant to Rule 10^ by the 
INTERNATIONAL DEPOSITARY AUTHORITY 
identified on tfie foiiowing page 



I DEPOSITOR 


II. IDENTIFICATION OF THE MICROORGANISM 


Name: 

AS ABOVE 
Address: 


Accession number given by the 
INTERNATIONAL DEPOSITARY AUTHORITY: 
NCIMB 40956 

Date of the depositor of the transfer^: 

1 July 1998 


m. VIABILITY STATEMENT 


The viability of the microorganism identified under II above was tested on 1 July 1998 2 On that date, the said 
microorganism was: 


^ viable 
3 




L] no longer viable 





1 Indicate the date of the original deposit or. where a new deposit or a transfer has been made, the most recent relevant date {date 
of the new deposit or date of the transfer). 

2 In the-cases referred to in Rule I0.2(a)(ii) and (iii), refer to the most recent viability test 

3 Mark with a cross the applicable box. 



Form BP/9 (first page) 



PCT/GBOO/02072 



IV, NDITIONS UNDER WHICH THE VIABILITY TEST HAS BEEN PERFORMED^ 



V. INTERNATIONAL DEPOSITARY AUTHORITY 



Name: NCIMBLtd., Signature(s) of pcrson(s) having the power 

to represent the International Depositary 
Address: 23 St Machar Drive, Authority or of authoij^ ^ial(s): 

Aberdeen, 

A24 3RY, 

Scotland. DateL/? July 1 998 




Fill in if the information has been requested and if the results of the test were negative. 
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BUDAPEST TREATY ON THE INTERNATIONAL 
RECOGNITION OF THE DEPOSIT OF MICROORGANISMS 
FOR THE PURPOSES OF PATENT PROCEDURE 



Dr. P.F, Leadlay, 
Department of Biochemistry, 
University of Cambridge, 
80 Tennis Court Road, 
Cambridge. 
CB2 IGA 



NAME AND ADDRESS OF THE PARTY 
TO WHOM THE VIABILITY STATEMENT 
IS ISSUED 



INTERNATIONAL FORM 

VIABILITY STATEMENT 
issued pursuant to Rule 10.2 by the 
INTERNATIONAL DEPOSITARY AUTHORITY 
identified on the following page 



1. DEPOSITOR 


II. IDENTIFICATION OF THE MICROORGANISM 


Name: 

AS ABOVE 
Address: 

« 


Accession number given by the 
INTERNATIONAL DEPOSITARY AUTHORITY: 
NCIMB 40957 

E>ate of the deposit or of the transfer^ : 

1 July 1998 


lU. VIABILITY STATEMENT 


The viability of the microorganism identified under II above Vfa& tested on 1 July 1998 ^. On that date, the said 

microorganism was: 


[E] viable 
3 




LU no longer viable 





1 Indicate the date of the original deposit or, where a new deposit or a transfer has been made, the most recent relevant date (date 
of the new deposit or date of the transfer). 

2 In the cases referred to in Rule 10.2(a)(ii) and (iii), refer to the most recent viability test 

3 Mark with a cross the applicable box. 



Form BP/9 (first page) 



PCT/GBOO/02072 



IV. NDITIONS UNDER WHICH THE VIABILITY TEST HAS BEEN PERFORMED^ 



V. INTERNATIONAL DEPOSITARY AUTHORITY 



Name: NCIMBLtd,, 

Address: 23 St Machar Drive, 
Aberdeen, 
A24 3RY, 
Scotland. 



Signature(s) of person(s) having the power 
to represent the International Depositary 
Authority or of authorise^ ofncial(s): 




Date: 9 July 1998 



4 



Fill in if the information has been requested and if the results of the test were negative. 



PCT/GBOO/02072 



BUDAPEST TREATY ON THE INTERNATIONAL 
RECOGNITION OF THE DEPOSIT OF MICROORGANISMS 
FOR THE PURPOSES OF PATENT PROCEDURE 



Dr. P.F. Leadlay, 
Department of Biochemistry, 
University of Cambridge, 
80 Tennis Court Road, 
Cambridge. 
CB2 IGA 



NAME AND ADDRESS OF THE PARTY 

TO WHOM THE VIABILITY STATEMENT 
IS ISSUED 



INTERNATIONAL FORM 

msiLITY STATEMENT 
issued pursuant to Rule 10.2 by the 
INTERNATIONAL DEPOSITARY AUTHORITY 
identified on the following page 



I. DEPOSITOR 


II. IDENTIFICATION OF THE MICROORGANISM 


Name: 


Accession number given by the 


AS ABOVE 


INTERNATIONAL DEPOSFTARY AUTHORITY: 


Address: 


NCIMB 40958 




Date of the deposit or of the transfer^ : 




I July 1998 


III. VIABILITY STATEMENT 



The viability of the microoiiganisin identified under II above wras tested on 1 July 1998 2. On that date, the said 

microorganism was: 



3 

0 viable 
3 

Q no longer viable 



1 Indicate the date of the original deposit or, where a new deposit or a transfer has been made, the most recent relevant date (date 
of the new <ieposit or date of the transfer). 

2 In the cases referred to in Rule I0.2(a)(ii) and (iii), refer to the most recent viability test. 
^ Mark with a cross the applicable box. 

Form BP/9 (first page) 



PCT/GBOO/02072 



IV. NDITIONS UNDER WHICH THE VIABILITY TEST HAS BEEN PERFORMED^ 



V. INTERNATIONAL DEPOSITARY AUTHORITY 



Name: NCIMBLtd., 

Address: 23 St Machar Drive, 
Aberdeen, 
A24 3RY, 
Scotland. 



Signature(s) of person(s) having the power 
to represent the International Depositary 
Authority or of authorised o^ia](s): 

DaJ?<^Juiy 1998 




Fill in if the infonnation has been requested and if the results of the test were negative. 



INTERNATIONAL SEARCH REPORT 



A.^ASSinCATION OF SUBJECT MATTER 

IPC 7 C12N15/52 C12N15/76 
C12Q1/68 C07H17/O8 



C12P17/18 
CO7H19/01 



Man lalAppnoatlonNa 

Pg/GB 00/02972 
C12P19/44 C12P19/62 



Acasutno to Intwiafianal Palgnt aaaglWeattw (IPC) or to hofli naUona l ctassmcatlon and IPC 

B. FIELDS 8BAHCHED 

Mkilmum doouirantBllon aaarnhed (otasslfkiaHan system tolowed by dassiflcatian svmbotst 

IPC 7 C12N C12P C12Q C07H 



OoQumentaticm searched other than minimum documentation to the extent that such documents are included In the flelda searched 
Eleotronio data base consulted during the International eaaich (name of data base and, where practical, search terms used)' 

EPO-Internal. MEDLINE, STRAND, EMBL, BIOSIS, EMBASE. WPI Data, PAJ. CHEM ABS Data 



C. DOCUMENTS CONSIDEtTED 7X5 BE RELEVANT 



Calegory 



Y 

X 



Y ■ 



CHation of document, wHh indication, where appropriate, of the relevant passages 

DONOVAN M 0 ET AL.: "Isolation of DNA 
involved in monensin biosynthesis by 
Streptoniyces cinnamonensis;" 
ABSTR. ANNU. MEET. AM. SOC. MICROBIOL. 88 
MEET. , 

1988, page 261 XPOO0949887 

abstract 

ARROWSMITH T J ET AL.: "Characterisation 
of acti -homologous DNA encoding polyketide 
synthase genes from the monensin producer 
Streptoniyces ci nnamonensi s . " 
MOLECUUR AND GENERAL GENETICS, 
vol. 234, no. 2, August 1992 (1992-98), 
pages 254-264, XP0Q2149754 
page 263, right-hand column, line 1-5 



fx] Furtherdocuments are Qstsd in the continuation of box C. 
Special catsfiorles of dted documents : 

"A*' dooument defining the general state of the art wNch Is not 
oonsldered to be of particular refevance 

"P earlier document but published on or after the International 
filing date 

V doaimem which may throw doubts on priority claim{6) or 
which Is dted to establish the publication date of another 
citation or other special reason (as specified) 

X)" document refenring to an oral disclosure, use, exhibition or 
other means 

T" document wbllshed prior to the Irilernallonal filing date but 
later than the priority date olalmed 



m 



Relevant to olaim No. 



1-3,6-14 



30-38 
1-3,6-14 



30-38 



Patentfamily members are Qstsd In annex. 



T" telerAjcumem published after the mtematlonal filing data 
or priority date and not in conflict with the application but 
cited to understand the principle or theory underlying the 
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FURTHER INFORMATION CONTINUED FROM PCT/ISA/ 218 



This International Searching Authority found multiple (groups of) 
inventions in this international application, as follows: 

1. Claims: 1,8-12,14,43,44 (all partially); 2-7,13,15-42, 

45 (all completely) 



A DMA sequence comprising the complete monensin (mon) gene 
cluster, or a part or a variant of it which encodes a 
polypeptide (or part of it) which is at least 80%, 
preferably at least 90% identical with one of the peptides 
according to SEQ ID NOs 12-33 (AcpX to MonAX as set out in 
table II), provided that said polypeptide is not all or part 
of amino acid 1-920 encoded by monAI. Vectors, transformed 
cells, hybridization probes and their uses* 

Use of mon genes to control expression (monRI), to effect 
chain release (monAIX and monAX), to provide a desired 
stereochemical outcome (monBI and monBII), or to provide 
epoxidase or cyclase activity (monCI and monCIl). Mon 
polypeptides having isomerase activity (MonBI and MonBII), 
or having chain terminating activity (MonAIX or MonAX), or 
having epoxidase activity (MonCI), or having cyclase 
activity (MonCI I). 

Processes for producing polyketides involving monensin 
loading or extension modules or domains » DNA sequences 
encoding hybrid polyketide synthases containing one or more 
monensin modules or domains (provided that it is not 
encoding an ery loading module, the first and second ery 
extension modules and the ery chain-terminating thioesterase 
in which the AT domain of the first ery extension module has 
been substituted by the ethyl malonyl-CoA AT from the 
monensin synthase), polyketide synthases encoded by said DNA 
sequences, and polyketide compounds produced by said 
polyketide synthases. Vectors and transformed cells. 

Methods of producing S. cinnamonensis capable of producing 
enhanced levels of monensin by overexpressing or amplifying 
the monRI gene, S» cinnamonensis strains produced thereby, 
and use of said strains in monensin production. 

Process for expressing a heterologous gene, e.g.,- a.PKS 
gene, in S. cinnamonensis under the control of monRI. 



2. Claims: 1,8-12,14 (all partially) 

A DNA sequence or a part or a variant of it which encodes a 
polypeptide (or part of it) which is at least 80%, 
preferably at least 90% identical with a peptide according 
to SEQ ID N0:5 (6dhA as set out in table 11), vectors, 
transformed cells, hybridization probes and their uses. 
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3. Claims: 1,8-12,14 (all partially) 

A DM sequence or a part or a variant of it which encodes a 
polypeptide (or part of it) which is at least 80%, 
preferably at least 90% identical with a peptide accordina 
to SEQ ID N0;6 (DapA as set out in table II), vectors, 
transformed cells, hybridization probes and their uses. 

4. Claims: 1,8-12,14 (all partially) 

A DNA sequence or a part or a variant of it which encodes a 
polypeptide (or part of it) which is at least 80%, 
preferably at least 90% identical with a peptide according 
to SEQ ID NO: 7 (0rf3 as set out in table II), vectors, 
transformed cells, hybridization probes and their uses. 

5. Claims: 1,8-12,14 (all partially) 

A DNA sequence or a part or a variant of it which encodes a 
polypeptide (or part of it) which is at least 80%, 
preferably at least 90% identical with a peptide according 
to SEQ ID M0:8 (0rf4 as set out in table II), vectors, 
transformed cells, hybridization probes and their uses. 



6. Claims: 1,8-12,14 (all partially) 

A DNA sequence or a part or a variant of it which encodes a 
polypeptide (or part of it) which is at least 80%, 
preferably at least 90% identical with a peptide according 
to SEQ ID N0:9 (0rf5 as set out in table II), vectors, 
transformed cells, hybridization probes and their uses. 



7. Claims: 1,8-12,14 (all partially) 

A DMA sequence or a part or a variant of it which encodes a 
polypeptide (or part of it) which is at least 80%, 
preferably at least 90% identical with a peptide according 
to SEQ ID NO:10 (Orf6 as set out in table II), vectors, 
transformed cells, hybridization probes and their uses. 



8. Claims: 1,8-12,14 (all partially) 

A DNA sequence or a part or a variant of it which encodes a 
polypeptide (or part of it) which is at least 80%, 
preferably at least 90% identical with a peptide according 
to SEQ ID MO: 11 (Orf7 as set out in table II), vectors, 
transformed cells, hybridization probes and their uses. 
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9. Claims: 1,8-12,14 (all partially] 

A DMA sequence or a part or a variant of it which encodes 
polypeptide (or part of it) which is at least 80%, 
preferably at least 90% identical with a peptide according 
to SEQ ID N0:34 (0rf29 as set out in table II), vectors, 
transformed cells, hybridization probes and their uses. 

10. Claims: 1,8-12,14 (all partially) 

A m sequence or a part or a variant of it which encodes a 
polypeptide (or part of it) which is at least 8G%, 
preferably at least 90% identical with a peptide accordina 
to SEQ ID N0:35 (LipB as set out in table II), vectors, 
transformed ceils, hybridization probes and their uses. 



11. Claims: 1,8-12,14 (all partially) 

A DNA sequence or a part or a variant of it which encodes a 
polypeptide (or part of it) which is at least 80%. 
preferably at least 90% identical with a peptide accordina 
to SEQ ID N0:36 (OrfBl as set out in table II), vectors, 
transformed cells, hybridization probes and their uses. 



12. Claims: 1,8-12,14 (all partially) 

A DNA seguence or a part or a variant of it which encodes a 
polypeptide (or part of it) which is at least 80%, 
preferably at least 90% identical with a peptide according 
to SEQ ID N0:37 (0rf32 as set out in table II), vectors, 
transformed cells, hybridization probes and their uses. 



13. Claims: 1,8-12,14 (all partially) 

A DMA sequence or a part or a variant of it which encodes a 
polypeptide (or part of it) which is at least 80%, 
preferably at least 90% identical with a peptide according 
to SEQ ID NC:38 (AmtA as set out in table II), vectors, 
transformed cells, hybridization probes and their uses. 

14. Claims: 43,44 (both partially) 

Process for expressing a heterologous gene, e.g., a PKS 
gene, in S. cinnamonensis under the control of actII/orf4. 
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